Conversation
| regex = "1.9.1" | ||
| tree-sitter = "0.20.10" | ||
| tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" } | ||
| tree-sitter-rust = "0.20.3" |
There was a problem hiding this comment.
It seems like it might be a little silly to have this crate depend on all of these also (along with tree-sitter-grep depending on them)?
Not sure what that would entail to eg "delegate" that dependency, maybe some exposed API from tree-sitter-grep to get the tree_sitter::Language for a given SupportedLanguage or something like that?
There was a problem hiding this comment.
Yeah, or just a straight re-export of tree-sitter-rust from somewhere in tree-sitter-grep. (Maybe feature flagged.)
There was a problem hiding this comment.
Ya actually I ended up doing some crate re-exporting in a recent PR (for rule-author "consumers" I guess)
Here what seems to have been the pattern is that I exposed SupportedLanguage on tree_sitter_grep and tree-sitter-lint is just re-using that as its own "language" type, and a SupportedLanguage can give you its corresponding tree_sitter::Language
|
|
||
| use crate::{rule::ResolvedRule, violation::Violation}; | ||
|
|
||
| pub struct Context { |
There was a problem hiding this comment.
I was somewhat mimicking ESLint's rule-authoring API
But the concept of "context" being used here is a little fuzzy at the moment
In ESLint's API, they are calling the rule's create() callback once for each file that the rule is being run against, so it supplies a context that is specific to that file
(which I guess would enable eg tailoring that rule/its internal state to the specific file that it's being run against?)
The big things that seems questionable about mimicking that here is that we need to parse tree-sitter queries for the rule "listeners", and so doing that parsing n times (where n = the # of files we're checking) seems like not the way to do things
It's possible that that query-parsing could be eg memoized (for the common case that for each file you're registering the same exact set of tree-sitter queries for the listeners) if we did in fact want to have something like the ESLint API's "scope" (where you can basically configure the rule once per file)?
But no idea what actually makes sense
So for the moment there's this "global to the linter run"-level Context, and then a QueryMatchContext that gets supplied to your listener callback
There was a problem hiding this comment.
For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?
There was a problem hiding this comment.
Yup I haven't really "wired up" any "rule-driven language" stuff yet but that definitely seems right to me
| } | ||
|
|
||
| fn print_violation(violation: &Violation, query_match_context: &QueryMatchContext) { | ||
| println!( |
There was a problem hiding this comment.
This is presumably not as good as using something ripgrep-Printer-ish (eg might suffer from interleaving)?
| pub fn run(_args: Args) { | ||
| let language = tree_sitter_rust::language(); | ||
| let context = Context::new(language); | ||
| let resolved_rules = get_rules() |
There was a problem hiding this comment.
Rules/listeners get "resolved" by eg calling the create() callback and then actually parsing the tree-sitter queries etc
| derive_builder = "0.12.0" | ||
| regex = "1.9.1" | ||
| tree-sitter = "0.20.10" | ||
| tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" } |
There was a problem hiding this comment.
The dependency here is on the proof-of-concept run-context branch plus a couple of tweaks
| CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT, | ||
| ); | ||
| assert!( | ||
| matches!(query_text_with_unified_capture_name, Cow::Owned(_),), |
There was a problem hiding this comment.
This is relying on the presumed invariant that regex.replace_all() will return an "owned" (ie String) Cow if and only if it actually did some replacing
I suppose could also just string-compare that the replaced version is different than the original?
There was a problem hiding this comment.
I suppose could also just string-compare that the replaced version is different than the original?
That would be much more clear to read, I think.
There was a problem hiding this comment.
But more expensive? How expensive is string-equality comparison?
(As always might be being "naive about optimization guy" but) if this version is "reliable" and more performant then I'd be inclined to solve the readability issue by just eg hiding it behind a well-named helper (eg were_any_captures_replaced())
There was a problem hiding this comment.
Sure, yeah, that seems fine, and maybe worth a comment about why the Cow match works.
There was a problem hiding this comment.
Ok added were_any_captures_replaced() + comment on the current branch
| .on_query_match(|node, query_match_context| { | ||
| query_match_context.report( | ||
| ViolationBuilder::default() | ||
| .message(r#"Use '_d()' instead of 'Default::default()'"#) |
There was a problem hiding this comment.
These are actual rules I thought of for the Typescript compiler in Rust code-base
Specifically here I've been playing with using a "shorthand" _d() helper vs the more verbose Default::default()
Neither of them are "flexing" the tree-sitter side of it very hard but whatever
| } | ||
|
|
||
| fn no_default_default_rule() -> Rule { | ||
| RuleBuilder::default() |
There was a problem hiding this comment.
The builder pattern seemed appropriate for these but I could also see adding an (optional) layer on top of that for people who wanted a slightly "slimmer" syntax eg:
rule! {
name => no_default_default,
create => |context| {
vec![
rule_listener! {
query => r#"..."#,
capture_name => "c",
on_query_match => |node, query_match_context| {
...
}
},
...
]
}
}
There was a problem hiding this comment.
My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.
There was a problem hiding this comment.
This theme keeps coming up of tooling "pulling up short" of really trying to parse the contents of macro invocations and consumers suffering for it (the thing I had to do for tree-sitter-grep of forking the Rust tree-sitter grammar to try and parse "typical" macro-invocation arguments; a similar thing came up here with doing syn-based traversal of parsed AST's where it "wasn't working as expected" without some extra "convincing" to traverse into macro contents; and like you're saying rustfmt not formatting macro invocation contents that it kind of seems like it should be able to format)
Ya I'm sort of into the more "magical" macro syntax for this use case of trying to make rule authorship feel "smooth" (and ESLint-like) but I'm assuming it won't be everyone's style
In which case (in terms of how the rule-authoring API's are currently organized) (a) you can always write the more boilerplate-y but "just Rust" version and/or (b) there's really no reason there couldn't be alternate macro syntaxes (whether "official" or not) (since it's just a way of generating "publicly-writable Rust code") that are maybe a bit less "magical"?
| on_query_match, | ||
| } = self; | ||
| let query = Query::new(context.language, &query_text).unwrap(); | ||
| let capture_index = match capture_name { |
There was a problem hiding this comment.
This sort of mimics the capture-index-resolution logic from tree-sitter-grep
| #[builder(setter(into))] | ||
| pub struct Violation<'a> { | ||
| pub message: String, | ||
| pub node: &'a Node<'a>, |
There was a problem hiding this comment.
Right per other comment if tree_sitter::Node is Copy then it seems un-idiomatic to store a reference to it
There was a problem hiding this comment.
Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is Node Copy, though? I'd have thought it held references to other things, which would require cloning.)
There was a problem hiding this comment.
I believe it's either a type alias of or a newtype wrapper around a raw pointer to the underlying tree-sitter C API Node type (and pointers are Copy)
There was a problem hiding this comment.
Ah right, and that works because the Node has an associated lifetime.
|
|
||
| use crate::{rule::ResolvedRule, violation::Violation}; | ||
|
|
||
| pub struct Context { |
There was a problem hiding this comment.
For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?
| "{:?}:{}:{} {} {}", | ||
| query_match_context.path, | ||
| violation.node.range().start_point.row + 1, | ||
| violation.node.range().start_point.column + 1, |
There was a problem hiding this comment.
Maybe worth eventually making some PositionForPrinting struct that can take the 0-indexed row and column, and implements Display in some reasonable way? (Maybe not if it would need to support multiple output styles, like tree-sitter-grep does.)
There was a problem hiding this comment.
Ah you're saying leverage Display for "how to formatted-output a single violation"?
Right that makes sense that "if there's one way to do it" that would be a nice thing to reach for
| .into_iter() | ||
| .map(|rule| rule.resolve(&context)) | ||
| .collect::<Vec<_>>(); | ||
| let aggregated_queries = AggregatedQueries::new(&resolved_rules, &context); |
| "-l", | ||
| "rust", | ||
| "--capture", | ||
| CAPTURE_NAME_FOR_TREE_SITTER_GREP, |
There was a problem hiding this comment.
Ha, fun that this silly command-line-style API is getting some use already.
| CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT, | ||
| ); | ||
| assert!( | ||
| matches!(query_text_with_unified_capture_name, Cow::Owned(_),), |
There was a problem hiding this comment.
I suppose could also just string-compare that the replaced version is different than the original?
That would be much more clear to read, I think.
| } | ||
|
|
||
| fn no_default_default_rule() -> Rule { | ||
| RuleBuilder::default() |
There was a problem hiding this comment.
My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.
| .on_query_match(|node, query_match_context| { | ||
| query_match_context.report( | ||
| ViolationBuilder::default() | ||
| .message(r#"Use '_d()' instead of 'Default::default()'"#) |
| pub struct Rule { | ||
| pub name: String, | ||
| #[builder(setter(custom))] | ||
| pub create: Arc<dyn Fn(&Context) -> Vec<RuleListener>>, |
There was a problem hiding this comment.
Ah, not sure I realized you could directly contain a dyn trait in an Arc, but that makes sense, since it's essentially a fancy Box?
There was a problem hiding this comment.
Yup and Box<dyn Trait>/Rc<dyn Trait>/Arc<dyn Trait> are all "special" as far as being able to be the "self receiver type", I take advantage of that in a subsequent PR
| #[builder(setter(into))] | ||
| pub struct Violation<'a> { | ||
| pub message: String, | ||
| pub node: &'a Node<'a>, |
There was a problem hiding this comment.
Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is Node Copy, though? I'd have thought it held references to other things, which would require cloning.)
In this PR:
To test:
If you run this against a Rust codebase that includes
Default::default()and/orlazy_static!()usages it should yell at you for them