Skip to content

Crate-level API changes for tree-sitter-lint#63

Open
helixbass wants to merge 45 commits into
run-contextfrom
run-context-tree-sitter-lint
Open

Crate-level API changes for tree-sitter-lint#63
helixbass wants to merge 45 commits into
run-contextfrom
run-context-tree-sitter-lint

Conversation

@helixbass
Copy link
Copy Markdown
Owner

In this PR:

  • some tweaks to the crate-level API from run-context that accommodate the tree-sitter-lint use case

(since the base branch is not "ready to merge", transitively neither is this really, though I suppose it could get merged down into the base branch?)

To test:
Per the updated example, the crate-level API now gives you visibility of the query "pattern index" that matched in addition to the tree_sitter::Node that matched

Based on run-context

Comment thread src/lib.rs Outdated
)
}

pub struct CaptureInfo<'node> {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming?

Comment thread src/searcher/mod.rs Outdated
let found_capture_index = found_capture_index as u32;
if found_capture_index != capture_index {
.filter_map(|(match_, index_into_query_match_captures)| {
let this_capture = &match_.captures[index_into_query_match_captures];
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes (up through let single_captured_node = ...) are "their own thing"

What I ran into was that my understanding of this (what was being called) found_capture_index usize being returned by the .captures() API was incorrect

Based on a little sanity-checking, it appears to be an index into the .captures on the other "match" (tree_sitter::QueryMatch) that gets returned

(not a "capture index" for the query as a whole) (maybe the usize -> u32 casting should've been a tip-off?)

But so this should get merged into master as its own little PR presumably (along with some test cases of queries with multiple "top-level patterns" (which I guess is the only time this issue was happening?))

Comment thread Cargo.toml
@@ -1,5 +1,5 @@
[package]
name = "tree-sitter-grep"
name = "tree_sitter_lint_tree-sitter-grep"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per helixbass/tree-sitter-lint#4, updating this to allow publishing that crate while this hasn't "landed"

Comment thread src/args.rs Outdated
const ALL_NODES_QUERY: &str = "(_) @node";

#[derive(Parser)]
#[derive(Clone, Parser)]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the need to do args.clone() is a "smell" that the crate-level API entry points should be taking an &Args instead?

Comment thread src/lib.rs Outdated
})
}

pub fn run_for_slice_with_callback(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry point is needed for the use case where tree-sitter-lint itself has an in-memory "slice" that it wants to lint (specifically it's using that for its rule-testing helpers)

Comment thread src/lib.rs Outdated
pub fn run_for_slice_with_callback(
slice: &[u8],
args: Args,
mut callback: impl FnMut(CaptureInfo) + Sync,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Vs the things that get passed to the callback in the run_with_callback() entry point) having the callback get passed the slice and a path didn't seem to make sense for this use case so it just gets passed the "capture info"

Comment thread src/lib.rs Outdated
args: Args,
mut callback: impl FnMut(CaptureInfo) + Sync,
) -> Result<RunStatus, Error> {
let language = args.language.ok_or(Error::LanguageMissingForSlice)?;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically Args has different "validation" for this entry point (specifically we need to know which language you want this slice to be treated as)

I started thinking that maybe the slice itself could/should be a field on Args and then that would be something that Args could "always validate" (ie that if its eg .slice_to_search field is present then its .language field must also be present)?

But that might be weird because then you'd have to sort of assert that "for this entry point we expect args.slice_to_search to be set" and vice-versa?

Comment thread src/lib.rs Outdated
mut callback: impl FnMut(CaptureInfo) + Sync,
) -> Result<RunStatus, Error> {
let language = args.language.ok_or(Error::LanguageMissingForSlice)?;
let query_text = args.get_loaded_query_text()?;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if there are great ways to DRY this up more wrt other entry points, didn't worry about that too much at the moment

Comment thread src/lib.rs

let non_fatal_errors = non_fatal_errors.lock().unwrap().clone();
if non_fatal_errors.is_empty() {
cached_queries.error_if_no_successful_query_parsing()?;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not make any sense/be redundant since above we already checked that "the query for the definitely-specified language is parseable"

Comment thread src/searcher/mod.rs Outdated
Ok(())
}

pub fn search_slice_callback_no_path(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "organization" of these searcher methods is probably weird but not like I 100% understand them in the first place

Here there was already a .search_slice_callback() method (used internally by the .search_path_callback() "entry point") but it expected a Path to be passed and per above comment its callback argument took the additional file contents/path arguments

So just made a separate .search_slice_callback_no_path() "entry point"

Comment thread src/use_searcher.rs

thread_local! {
static SEARCHER: OnceCell<(Rc<RefCell<Searcher>>, *const Args)> = Default::default();
static SEARCHER_PER_ARGS_INSTANCE: RefCell<HashMap<*const Args, Rc<RefCell<Searcher>>>> = Default::default();
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd forgotten that we have now invalidated this invariant (but glad I was asserting it!) (that there's only ever once instance of Args used during a program execution)

So just made this a per-Args-instance map instead (yes in theory then you should probably be more "legitimately cache-like" and eg evict once you get past a certain size but not worrying about that for now)

This same change should be made to the printer which is using the same pattern

Comment thread src/lib.rs Outdated
})
}

pub fn run_with_per_file_callback(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running into wanting callbacks in tree-sitter-lint that are happening in the context of a single file (eg rule-listener callbacks) to be able to have mutable access to things (it seemed silly to wrap all of the "per-file-run rule state" in eg Mutex's when we know that once we're in the context of a single file, things are happening "serially")

So by adding an extra layer of "callback indirection" we can accomplish that and have the "per-match" callback be an FnMut

Comment thread src/lib.rs Outdated

pub fn run_with_per_file_callback(
args: Args,
per_file_callback: impl Fn(&DirEntry, Box<dyn FnMut(Box<dyn FnMut(CaptureInfo, &[u8], &Path) + '_>) + '_>)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's a way to avoid Box'ing these, not sure otherwise how you'd specify the types of these closures?

Comment thread src/lib.rs Outdated
}

pub fn run_for_slice_with_callback<'a>(
slice: impl Into<RopeOrSlice<'a>>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the idea here will be that the slice argument can be generic as something like impl TextProvider<'a> + Parseable (so not need to have any "specific" awareness of ropey at the tree_sitter_grep level)

But that is blocked by running into lifetime issues at the point of calling into tree_sitter (per tree-sitter/tree-sitter#2432)

(although as discussed maybe we could reasonably temporarily depend on a forked version of tree_sitter if there is in fact a way to tell all of the individual language-grammar crates "hey use this version of that dependency instead"?)

So for now made it take the "concrete" type of RopeOrSlice

(the point of this is to try and be more efficient where in tree-sitter-lint-lsp and maybe will also add to the tree-sitter-lint "fixing loop" we're storing file text in a ropey::Rope and so it's presumably expensive to convert that to "one big" &[u8] vs letting it be parsed/queried "in chunks")

Comment thread src/lib.rs Outdated

pub fn run_for_slice_with_callback<'a>(
slice: impl Into<RopeOrSlice<'a>>,
tree: Option<&Tree>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than eg a separate "entry point" for when you have the pre-parsed tree_sitter::Tree, just added it as an optional param to this existing one

Comment thread src/searcher/mod.rs Outdated
let query = &query_context.query;
let capture_index = query_context.capture_index;
let filter = &query_context.filter;
let tree: Cow<'_, Tree> = tree.map_or_else(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels a bit "ceremonial" but I think this is actually a sensible "idea" for how to unify the types of "we got passed an Option<&T> (where we don't want to .clone() the inner &T) and we either unwrap that or have the ability to create a new (owned) T"

(I know that in the past I have done clunkier things in such situations)

Comment thread src/searcher/mod.rs Outdated
|| {
Cow::Owned(
slice
.parse(&mut get_parser(query_context.language), None)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Parseable trait that I added for abstracting away the ropey::Rope-specific-ness (as far as "do we parse this as one big string vs in chunks")

Comment thread src/treesitter.rs
parser.parse_with(
&mut |byte_offset, _| {
let (chunk, chunk_start_byte_index, _, _) = self.chunk_at_byte(byte_offset);
&chunk[byte_offset - chunk_start_byte_index..]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if this is actually right or not

Comment thread src/treesitter.rs
}

impl<'a> TextProvider<'a> for RopeOrSlice<'a> {
type I = RopeOrSliceTextProviderIterator<'a>;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of...unidiomatic that they named their associated type I?

Comment thread src/treesitter.rs
let rope_slice = rope.byte_slice(node.byte_range());
RopeOrSliceTextProviderIterator::Rope(RopeOrSliceRopeTextProviderIterator::new(
rope_slice,
|rope_slice| rope_slice.chunks(),
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so here I convinced myself that I needed a self-referential data structure

Why? Because we have to create the rope_slice RopeSlice "locally" (because we have to derive it "dynamically" from the rope) and then really all we need to store in our Iterator type is its Chunks iterator, but there's no eg .into_chunks() (that would take self ownership of that locally-created value), just .chunks() which takes &self by reference (and its (ie the Chunks's) lifetime depends on that &self lifetime (because it's an iterator over references to pieces of that &self)), but that &self lifetime would go out of scope at the end of the enclosing text() method call because it's a local variable to that method call

So therefore the only way I could picture achieving it was to say "ok rope_slice you can tag along for the Iterator too even though we don't really "need" you, we just need to "keep you alive long enough""

Aka a self-referential struct because the Chunks's lifetime refers to that RopeSlice which at that point is another field in the same struct

So I tried out I think self_cell but it seemed a little clunkier so then tried ouroboros and that seemed smoother

Here this is how they have you initialize the dependent field

Comment thread src/treesitter.rs
fn text(&mut self, node: Node) -> Self::I {
match self {
Self::Slice(slice) => {
RopeOrSliceTextProviderIterator::Slice(iter::once(&slice[node.byte_range()]))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wasn't clever enough to think to express this as an iter::once() myself, I initially wrote it with a manual "have we returned the one slice yet" state field but then I saw this is how they had it written somewhere (I guess in the baked-in impl TextProvider for &[u8]?)

Comment thread src/treesitter.rs
pub struct RopeOrSliceRopeTextProviderIterator<'a> {
rope_slice: RopeSlice<'a>,

#[borrows(rope_slice)]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and this is how you declare the dependency between fields on the #[self_referencing] struct

Comment thread src/treesitter.rs
type Item = &'a str;

fn next(&mut self) -> Option<Self::Item> {
self.with_chunks_iterator_mut(|chunks_iterator| chunks_iterator.next())
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and this is how you get a mutable reference to one of the (non-"dependee", I believe) fields on it

@helixbass helixbass force-pushed the run-context-tree-sitter-lint branch from eb4abe0 to 3a039da Compare July 27, 2023 18:02
Comment thread Cargo.toml Outdated
tree-sitter-typescript = "0.20.2"

[patch.crates-io]
tree-sitter = { git = "https://github.com/helixbass/tree-sitter", rev = "57e98fb0" }
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read about how to do this "force dependencies to use this version"

The thing that I guess I wasn't anticipating is that I also had to do this in any "outer" crates (tree-sitter-lint and its project-local crates) even though they don't have their own explicit dependency on tree-sitter (they use tree-sitter-grep's re-export)?

Comment thread rustfmt.toml
imports_granularity = "Crate"
wrap_comments = true
edition = "2021"
# wrap_comments = true
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing of rustfmt often deciding to "wrap" code that I had commented out made me decide to turn this off

Comment thread src/searcher/glue.rs
let query_context = self.core.query_context();
let query = query_context.query.clone();
let filter = query_context.filter.clone();
let mut matches = get_captures(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did in fact DRY up the boilerplate of "running a tree-sitter query" and exposed those as get_captures()/get_captures_for_enclosing_node()

Comment thread src/searcher/glue.rs
fn sink<'tree>(
&mut self,
matches: &mut impl Iterator<Item = Node<'tree>>,
matches: &mut impl StreamingIterator<Item = CaptureInfo<'tree>>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That DRY'ed up get_captures() returns a StreamingIterator which has nothing to do with async "streams", it's a crate for doing a "lending" iterator (ie where the items returned are dependent on the iterator's lifetime)

Comment thread src/searcher/mod.rs
pub fn search_slice_callback_no_path<'a, 'text, 'tree>(
&mut self,
query_context: QueryContext,
// slice: impl TextProvider<'a> + Parseable + 'a,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having gotten get_captures() working with a RopeOrSlice argument I took a quick stab at trying to have it take the more generic impl TextProvider + Parseable but ran into some (more) lifetime stuff so left it more "concrete" for now at least

Comment thread src/treesitter.rs
}

#[self_referencing]
pub struct Captures<'a, 'text: 'a, 'tree: 'a> {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so for the get_captures() iterator type again we're in self-referencing struct world

Why? Because the items returned by a tree_sitter::QueryCaptures are lifetime-dependent on things like the tree, text and query that passed to QueryCursor::captures() (and I think that implies that?)

And the query cursor itself wants to be created inside the shared abstraction but the captures iterator is also dependent on its lifetime I believe

So basically "toss all the stuff that the iterator depends on as fields and then use ouroboros to make the one iterator field self-referentially dependent on those"

But then it didn't want to implement Iterator, I think basically because items returned by the iterator would be lifetime-dependent on the struct (aka Iterator) itself and Iterator does not support that (your returned items cannot depend on the lifetime of the Iterator itself)

So then looked like streaming_iterator was a reasonable option for trying to implement a "lending" iterator on this struct

I was able to get that to basically work but see comments below re a little unsafe

Comment thread src/treesitter.rs Outdated
capture_index: u32,
#[borrows(text, mut query_cursor, query, tree)]
#[covariant]
captures_iterator: QueryCaptures<'this, 'this, 'this, RopeOrSlice<'this>>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried to "go ahead" and call the .filter_map() (that in the previous versions of this pattern was always getting called to narrow the returned captures) on the QueryCursor::captures() iterator (converted to a StreamingIterator) but ran into "lifetime stuff" including some stuff around covariant/invariant "subtyping"

But so keeping this field "simpler" and then basically "manually" implementing StreamingIterator (including "inlining" the .filter_map() logic) made it more straightforward

Comment thread src/treesitter.rs
impl<'a, 'text, 'tree> StreamingIterator for Captures<'a, 'text, 'tree> {
type Item = CaptureInfo<'tree>;

fn advance(&mut self) {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingIterator "decouples" "loading" the next item from "returning" it

Comment thread src/treesitter.rs

fn get<'this>(&'this self) -> Option<&'this Self::Item> {
let next_capture = self.borrow_next_capture();
// SAFETY: I think this is ok as long as CaptureInfo isn't
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is where I resorted to unsafe

I think I sort of understand why it was flagging this as an error without it

Basically I've declared the associated Item type here to be CaptureInfo<'tree> but it thinks that what I'd be trying to return is a CaptureInfo<'this> (ie its lifetime is dependent on the struct itself aka self-referential), not a CaptureInfo<'tree> (which would be allowed to outlive the struct)

And I think it might be right about that but not totally clear on that

But there's no way for me to specify 'this in the declaration of the Item associated type afaik

But it is by definition making the & in the return type here a "self-referential lifetime" so what I convinced myself of is that the only way you could "break out of that" and treat the CaptureInfo contents as though they're 'tree when in fact they're 'this is if you could copy/clone it, and CaptureInfo doesn't implement Copy/Clone

(I'm not even sure that would be a way to achieve anything bad - CaptureInfo just has a usize and a Node<'tree> (which is "actually" 'tree, I don't see how it couldn't be) on it so I don't think there are like any fields that eg when copied could incorrectly outlive the &'this reference - you can safely copy the Node<'tree> field)

Comment thread src/treesitter.rs
}

#[self_referencing]
pub struct CapturesForEnclosingNode<'a, 'text: 'a, 'tree: 'a> {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, just for the variation (as used by tree-sitter-lint) where you want to provide an explicit "enclosing node" for the query vs it defaulting to querying against the entire file contents associated with the Tree

Comment thread src/args.rs
const ALL_NODES_QUERY: &str = "(_) @node";

#[derive(Parser)]
#[derive(Builder, Clone, Default, Parser)]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed an ArgsBuilder for tree-sitter-lint to use (vs building Args clunkily from passing "command-line args")

Presumably there should be some additional validation eg that some of the invariants that clap checks when parsing command-line args are not violated when using the builder, but I haven't done that yet

Comment thread src/args.rs Outdated
query_text: Option<String>,

#[clap(skip)]
query_per_language: Option<HashMap<SupportedLanguage, Arc<Query>>>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "tackling a couple birds with one stone" (?), first this is allowing passing in pre-parsed queries, second this is supporting "query-per-language" (vs "just one query")

Query-per-language allows tree-sitter-lint to let a single "run" of tree-sitter-grep "drive" everything, which seems to make sense (so that we're only doing one project-file-walking)

Comment thread src/args.rs Outdated
pub(crate) fn get_project_file_walker_types(&self) -> Types {
get_project_file_walker_types(self.language)
get_project_file_walker_types(self.language.map(|language| vec![language]).or_else(|| {
self.query_per_language
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically asserting that if you pass query_per_language those are "all" of the languages that you're interested in

Some of this is probably a little bit conceptually unresolved but that seems more or less like the right idea

Comment thread src/args.rs
pub(crate) fn get_loaded_query_text(&self) -> Result<String, Error> {
pub(crate) fn get_loaded_query_text_per_language(
&self,
) -> Result<QueryOrQueryTextPerLanguage, Error> {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may seem like a weird "half-step" (toward Args just going ahead and actually parsing the query text), but I think it makes sense

The reason is that we do want to go ahead and fail fast if eg we got passed a path to a query file that we can't read

But eg for the case of auto-language (and not having passed query_per_language) we want to let CachedQueries lazily resolve the query text to a parsed query

Comment thread src/lib.rs
}
}

#[derive(Default)]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got to track down the fact that sometimes I wasn't seeing a newly added lint-rule violation get printed out

The reason ended up being that I forgot that in adding query_per_language (to Args) then the capture index could no longer be a "singleton" (because now each language's query could have a different "target" capture index)

So got rid of this data structure for it entirely and moved the per-language-query capture index storage/calculation into CachedQueries

Comment thread src/lib.rs
"Should've tried to parse in at least one language or else should've already failed on no candidate files"
);
return Err(Error::NoSuccessfulQueryParsing(attempted_parsings));
if let Some((_, capture_index_error)) =
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the storage of "either a query-parsing error or a (subsequent) capture-index-resolution error" in CachedQueries makes sense, but here this gets a little clunky probably as far as deciding how to "resolve"/"surface" those

(eg should it be "fatal" if for one of the per-language queries there was a capture index error (eg if they passed a capture-name via Args and it wasn't valid for all of them? Speaking of which, probably query_per_language should be modeled as taking a capture name per language too, it just happens that for the tree-sitter-lint use case there's one capture name for all of the different language queries so it works to pass the single capture-name via Args)?)

fn main() {
let args = Args::parse_from(["tree_sitter_grep", "-q", "(function_item) @f"]);
run_with_callback(args, |node, file_contents, path| {
run_with_callback(args, |query_match, file_contents, path| {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now I updated all of the crate-level "callback"-oriented entry points to use the match-oriented (vs single-capture-oriented) querying/callback style

Not sure where it should actually "land", for the moment mostly just driving based on the tree-sitter-lint use case

&file_contents[query_match
.nodes_for_capture_index(0)
.next()
.unwrap()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is a little boilerplate-y I'm guessing the argument for in fact only exposing match-oriented crate-level callback API's is that I believe you can then basically "wrap" those (with a little match -> individual query captures boilerplate) to achieve the same idea/behavior as a single-capture-oriented style

Comment thread src/searcher/mod.rs
self.check_config()?;

log::trace!("slice reader: searching via multiline strategy");
get_matches(query_context.language, slice, &query_context.query, tree).for_each(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a get_matches() corresponding to the get_captures() "iterate over query captures" abstraction

Comment thread src/treesitter.rs
let next_match = self.borrow_next_match();
// SAFETY: Not as sure on this one?
let next_match: &'this Option<QueryMatch<'a, 'tree>> =
unsafe { mem::transmute(next_match) };
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same "technique" here (as with Captures above) for avoiding lifetime errors

No idea but I'd be surprised if this isn't "sound"

Comment thread src/language.rs
static SUPPORTED_LANGUAGE_COMMENT_KINDS: Lazy<BySupportedLanguage<HashSet<&'static str>>> =
Lazy::new(|| {
by_supported_language!(
Rust => ["line_comment", "block_comment"].into(),
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the ESLint helper methods are "comment-aware" and so "baking in" some comment-awareness at the tree-sitter-lint level seems to make sense (and it seemed most appropriate to push that down to the tree-sitter-grep level where the SupportedLanguage definitions live)

I have no idea if these are right but can go through and add the correct "kinds" here for the different languages at some point

Comment thread src/lib.rs
})
}

pub fn run_with_single_per_file_callback(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the version that isn't doing any actual querying on the tree-sitter-grep side (this was related to getting there to not be lifetime issues with storing eg tree_sitter::Node's in a tree-sitter-lint rule's "local state")

It just basically drives running a single callback for each found project file that exposes things like the detected language, file contents, tree-sitter tree and parsed query

Comment thread src/searcher/mod.rs
self.search_file_maybe_path(query_context, Some(path), &file, write_to)
}

pub fn load_file_contents<P, TError: SinkError>(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of "extracting" the existing logic related to loading file contents from disk (but not doing any querying)

@jayvdb jayvdb mentioned this pull request May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant