Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions gopls/internal/fuzzy/matcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ const (
// MaxPatternSize is the maximum size of the pattern used to construct the fuzzy matcher. Longer
// inputs are truncated to this size.
MaxPatternSize = 63

shortPatternSize = 3
)

type scoreVal int
Expand Down Expand Up @@ -88,7 +90,7 @@ func NewMatcher(pattern string) *Matcher {
}
}

if len(pattern) > 3 {
if len(pattern) > shortPatternSize {
m.patternShort = m.patternLower[:3]
} else {
m.patternShort = m.patternLower
Expand Down Expand Up @@ -123,12 +125,12 @@ func (m *Matcher) ScoreChunks(chunks []string) float32 {
// Empty patterns perfectly match candidates.
return 1
}

if m.match(candidate, lower) {
sc := m.computeScore(candidate, lower)
ok, l := m.match(candidate, lower)
if ok {
sc := m.computeScore(candidate, lower, l)
if sc > minScore/2 && !m.poorMatch() {
m.lastCandidateMatched = true
if len(m.pattern) == len(candidate) {
if l == len(candidate) {
// Perfect match.
return 1
}
Expand Down Expand Up @@ -182,25 +184,26 @@ func (m *Matcher) MatchedRanges() []int {
return ret
}

func (m *Matcher) match(candidate []byte, candidateLower []byte) bool {
func (m *Matcher) match(candidate []byte, candidateLower []byte) (bool, int) {
i, j := 0, 0
for ; i < len(candidateLower) && j < len(m.patternLower); i++ {
if candidateLower[i] == m.patternLower[j] {
j++
}
}
if j != len(m.patternLower) {
return false
// if the candidate is short the characters have to match completely.
if len(candidate) <= shortPatternSize && j != len(m.patternLower) {
return false, 0
}
Comment on lines +194 to 197
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we limit sloppy matches to longer patterns? I feel like we should instead filter based on a threshold of sloppy characters (i.e allow 1 or 2 non-matching characters, maybe depending on pattern length).

I'm assuming this early return here is for performance (i.e. want to skip the expensive scoring for candidates that clearly don't match). We don't want an O(n^2) check here, but maybe we can handle a O(3n) where we backtrack a couple times to allow up to 2 non-matching characters.

Copy link
Copy Markdown
Author

@SimoneDutto SimoneDutto Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this early return was becasue my change is allowing the score func to run more frequently, and this is just an holistic cutoff like "if there candidate to be matched is really short it must have all chars matching`.

However this is entirely up to debate, it was just to throw the idea of having a shortcut for short candidates.

I've experimented with the idea of "at least 30% of matching characters" or something like that, but i would honestly just prefer to have a simple, clear cut-off and run with it than hitting edge cases with percentages.
As I said, i'm entirely open to change this approach once we solve the other comment's problem, which is more important!


// The input passes the simple test against pattern, so it is time to classify its characters.
// Character roles are used below to find the last segment.
m.roles = RuneRoles(candidate, m.rolesBuf[:])

return true
return true, j
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite right. j is checking exact consecutive matches from the start of the pattern. If the typo appears near the beginning of the pattern (e.g. "tetsincrementalnope") the scoring won't be right since we will only consider "te" (I think?).

What happens if we just let the entire pattern score instead of trimming to the prefix?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite right. j is checking exact consecutive matches from the start of the pattern. If the typo appears near the beginning of the pattern (e.g. "tetsincrementalnope") the scoring won't be right since we will only consider "te" (I think?).

Thanks for the comment, it made me realized match was more nuanced than i've anticipated.
j expresses the number of matching chars going through the pattern a single time, not necessarly consecutive.
So tetsincrementalnope -> j is 3, because t and e are consecutive matches, but the third t is found later in the string, so it is counts as well.

However, this 3 seems important for the score func, because if we read the resulting matrix.
image
it's evident we have a significant drop in score after that 3.

So i would say that scores relies on the number of matching characters we can found reading through the pattern and the candidate a single time.
So, making a spelling error with a letter that can be later found in the string is better than making a spelling error with a wrong char.
Ex. tetsincrementalnope (score 3) is better than tewtincrementalNope (score 2)

By looking at the score matrix, i don't think we can change this behavior without changing the score func as well.
I don't know if you have suggestion on how this proceed, because this change that i've made is:

  • making spelling mistakes less punishing later in the string, because we are basically cutting the pattern to be matched at the first mistake (bar some exceptions discussed above)

But it is not:

  • fixing spelling mistakes in general, like if we would use Levenstein

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth to get rid of this scoring mechanism since it's kind of weird with spelling mistakes, but it would be a bigger job, plus changing some people's UX because the new scoring system will be different from the current one.

}

func (m *Matcher) computeScore(candidate []byte, candidateLower []byte) int {
func (m *Matcher) computeScore(candidate []byte, candidateLower []byte, matchPatterLen int) int {
pattLen, candLen := len(m.pattern), len(candidate)

for j := 0; j <= len(m.pattern); j++ {
Expand Down Expand Up @@ -328,8 +331,7 @@ func (m *Matcher) computeScore(candidate []byte, candidateLower []byte) int {
}
}
}

result := m.scores[len(candidate)][len(m.pattern)][m.bestK(len(candidate), len(m.pattern))].val()
result := m.scores[len(candidate)][matchPatterLen][m.bestK(len(candidate), matchPatterLen)].val()

return result
}
Expand Down
6 changes: 6 additions & 0 deletions gopls/internal/fuzzy/matcher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,12 @@ var scoreTestCases = []struct {
// We want the next two items to have roughly similar scores.
{p: "up", str: "unique_ptr", want: 0.75},
{p: "up", str: "upper_bound", want: 1},
// Pattern with some spelling errors.
{p: "tstincrementatlnope", str: "TestIncrementalNope", want: 0.61842},
{p: "tstincre", str: "TestIncrementalNope", want: 0.84375},
{p: "testssssss", str: "TestIncrementalNope", want: 0.4},
// Pattern longer than candidate.
{p: "foobarbaz", str: "foo", want: 0},
}

func TestScores(t *testing.T) {
Expand Down