-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: gopls allow fuzzy matching with spelling errors #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,8 @@ const ( | |
| // MaxPatternSize is the maximum size of the pattern used to construct the fuzzy matcher. Longer | ||
| // inputs are truncated to this size. | ||
| MaxPatternSize = 63 | ||
|
|
||
| shortPatternSize = 3 | ||
| ) | ||
|
|
||
| type scoreVal int | ||
|
|
@@ -88,7 +90,7 @@ func NewMatcher(pattern string) *Matcher { | |
| } | ||
| } | ||
|
|
||
| if len(pattern) > 3 { | ||
| if len(pattern) > shortPatternSize { | ||
| m.patternShort = m.patternLower[:3] | ||
| } else { | ||
| m.patternShort = m.patternLower | ||
|
|
@@ -123,12 +125,12 @@ func (m *Matcher) ScoreChunks(chunks []string) float32 { | |
| // Empty patterns perfectly match candidates. | ||
| return 1 | ||
| } | ||
|
|
||
| if m.match(candidate, lower) { | ||
| sc := m.computeScore(candidate, lower) | ||
| ok, l := m.match(candidate, lower) | ||
| if ok { | ||
| sc := m.computeScore(candidate, lower, l) | ||
| if sc > minScore/2 && !m.poorMatch() { | ||
| m.lastCandidateMatched = true | ||
| if len(m.pattern) == len(candidate) { | ||
| if l == len(candidate) { | ||
| // Perfect match. | ||
| return 1 | ||
| } | ||
|
|
@@ -182,25 +184,26 @@ func (m *Matcher) MatchedRanges() []int { | |
| return ret | ||
| } | ||
|
|
||
| func (m *Matcher) match(candidate []byte, candidateLower []byte) bool { | ||
| func (m *Matcher) match(candidate []byte, candidateLower []byte) (bool, int) { | ||
| i, j := 0, 0 | ||
| for ; i < len(candidateLower) && j < len(m.patternLower); i++ { | ||
| if candidateLower[i] == m.patternLower[j] { | ||
| j++ | ||
| } | ||
| } | ||
| if j != len(m.patternLower) { | ||
| return false | ||
| // if the candidate is short the characters have to match completely. | ||
| if len(candidate) <= shortPatternSize && j != len(m.patternLower) { | ||
| return false, 0 | ||
| } | ||
|
|
||
| // The input passes the simple test against pattern, so it is time to classify its characters. | ||
| // Character roles are used below to find the last segment. | ||
| m.roles = RuneRoles(candidate, m.rolesBuf[:]) | ||
|
|
||
| return true | ||
| return true, j | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is quite right. What happens if we just let the entire pattern score instead of trimming to the prefix?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, it made me realized However, this So i would say that scores relies on the number of matching characters we can found reading through the pattern and the candidate a single time. By looking at the
But it is not:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth to get rid of this scoring mechanism since it's kind of weird with spelling mistakes, but it would be a bigger job, plus changing some people's UX because the new scoring system will be different from the current one. |
||
| } | ||
|
|
||
| func (m *Matcher) computeScore(candidate []byte, candidateLower []byte) int { | ||
| func (m *Matcher) computeScore(candidate []byte, candidateLower []byte, matchPatterLen int) int { | ||
| pattLen, candLen := len(m.pattern), len(candidate) | ||
|
|
||
| for j := 0; j <= len(m.pattern); j++ { | ||
|
|
@@ -328,8 +331,7 @@ func (m *Matcher) computeScore(candidate []byte, candidateLower []byte) int { | |
| } | ||
| } | ||
| } | ||
|
|
||
| result := m.scores[len(candidate)][len(m.pattern)][m.bestK(len(candidate), len(m.pattern))].val() | ||
| result := m.scores[len(candidate)][matchPatterLen][m.bestK(len(candidate), matchPatterLen)].val() | ||
|
|
||
| return result | ||
| } | ||
|
|
||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we limit sloppy matches to longer patterns? I feel like we should instead filter based on a threshold of sloppy characters (i.e allow 1 or 2 non-matching characters, maybe depending on pattern length).
I'm assuming this early return here is for performance (i.e. want to skip the expensive scoring for candidates that clearly don't match). We don't want an
O(n^2)check here, but maybe we can handle aO(3n)where we backtrack a couple times to allow up to 2 non-matching characters.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this early return was becasue my change is allowing the
scorefunc to run more frequently, and this is just an holistic cutoff like "if there candidate to be matched is really short it must have all chars matching`.However this is entirely up to debate, it was just to throw the idea of having a shortcut for short candidates.
I've experimented with the idea of "at least 30% of matching characters" or something like that, but i would honestly just prefer to have a simple, clear cut-off and run with it than hitting edge cases with percentages.
As I said, i'm entirely open to change this approach once we solve the other comment's problem, which is more important!