Add Lean language indexing with optional LSP resolution#548
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e54d9a6ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| line: Math.max(0, (ref.line || 1) - 1), | ||
| character: Math.max(0, ref.column || 0), | ||
| }, |
There was a problem hiding this comment.
Convert tree-sitter byte columns before LSP requests
For Lean files containing multibyte Unicode before a reference on the same line, this sends the tree-sitter startPosition.column byte offset directly as an LSP character position. Because the worker initializes with empty capabilities and does not negotiate/use byte offsets, the Lean server interprets character as the protocol position encoding, so common Lean text such as α, →, or ℕ before target makes textDocument/definition query the wrong character and semantic resolution silently falls back or resolves incorrectly. Convert the extracted byte column to the LSP encoding for the opened line before sending the request, and do the inverse for returned locations before comparing to tree-sitter node ranges.
Useful? React with 👍 / 👎.
| const nameNode = getChildByField(node, 'name'); | ||
| if (!nameNode) return; |
There was a problem hiding this comment.
Read Lean import modules from the module field
For any Lean file with import Foo.Bar, the vendored tree-sitter-lean grammar exposes the imported identifier under the module field, so looking up name returns null and this method exits before creating either the import node or the unresolved import reference. That prevents Lean import edges from ever being created and also leaves importedModules empty, so static candidate resolution loses the imported-module prefixes when LSP is disabled or unavailable.
Useful? React with 👍 / 👎.
| this.walkDescendants(declaration, (node) => { | ||
| if (node.type !== 'app') return true; | ||
| if (this.isInsideNodeType(node, 'attributes') || this.isInInheritancePosition(node)) return true; | ||
| const fn = getChildByField(node, 'fn'); | ||
| if (!fn) return true; |
There was a problem hiding this comment.
Match the vendored Lean application nodes for calls
For ordinary Lean calls such as Foo.bar 1, this branch never emits a call reference because the vendored tree-sitter-lean grammar does not produce app nodes with an fn field for applications. As a result, static-only indexing misses Lean call edges entirely whenever the optional LSP pass is disabled, unavailable, or capped by CODEGRAPH_LEAN_LSP_REF_LIMIT; update this traversal to the actual application node and function field names from the vendored grammar.
Useful? React with 👍 / 👎.
| const DECLARATION_TYPES = new Set([ | ||
| 'def', | ||
| 'theorem', | ||
| 'abbrev', |
There was a problem hiding this comment.
Include lemma declarations in Lean extraction
For any Lean declaration written with lemma, findDeclarationPayload only accepts node types from this set and therefore returns null because the vendored grammar exposes lemma separately from theorem. Lemmas are one of the common declaration forms in Lean projects, so their symbol nodes, references, docstrings, and visibility are all dropped; add lemma to the declaration/name-position sets and map it like theorem.
Useful? React with 👍 / 👎.
| if (child.type === 'field') { | ||
| this.createMember(parent, 'field', child); |
There was a problem hiding this comment.
Handle Lean struct_field nodes when extracting fields
For structures and classes that declare fields in a where block, the vendored Lean grammar emits those members as struct_field nodes, but this traversal only creates a field for nodes typed field. Consequently fields such as structure Point where x : Nat are omitted from the graph even though the parent structure is indexed; include struct_field here and pass it through createMember.
Useful? React with 👍 / 👎.
Summary
Adds Lean (
.lean) support to CodeGraph via a vendoredtree-sitter-lean.wasmgrammar and a dedicated Lean extractor.Lean indexing works statically and offline by default. When
lakeorleanis available, CodeGraph can run a best-effort Lean/Lake LSP definition pass to improve unresolved reference edges without making the Lean toolchain required.Changes
.leanlanguage detection and Tree-sitter parsing.resolvedBy: "lean-lsp"..lake/to default ignored dependency/build/cache directories, with.gitignorenegation still allowing explicit opt in.Configuration
New optional environment variables:
CODEGRAPH_LEAN_SEMANTICSCODEGRAPH_LEAN_LSP_COMMANDCODEGRAPH_LEAN_LSP_TIMEOUT_MSCODEGRAPH_LEAN_LSP_REF_LIMITVerification
node scripts/add-lang/check-grammar.mjs src/extraction/wasm/tree-sitter-lean.wasm <valid-lean-sample> 20npm test -- __tests__/lean-extraction.test.ts __tests__/resolution.test.tsnpm run buildnpm testnpm --prefix site run buildgit diff --check