Rules proof-of-concept by helixbass · Pull Request #1 · helixbass/tree-sitter-lint

helixbass · 2023-07-20T20:07:39Z

In this PR:

set up basic types for rules/individual "listeners" within a rule etc
hard-code everything including the language and a couple rules

To test:
If you run this against a Rust codebase that includes Default::default() and/or lazy_static!() usages it should yell at you for them

helixbass · 2023-07-20T20:10:02Z

+regex = "1.9.1"
+tree-sitter = "0.20.10"
+tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" }
+tree-sitter-rust = "0.20.3"


It seems like it might be a little silly to have this crate depend on all of these also (along with tree-sitter-grep depending on them)?

Not sure what that would entail to eg "delegate" that dependency, maybe some exposed API from tree-sitter-grep to get the tree_sitter::Language for a given SupportedLanguage or something like that?

Yeah, or just a straight re-export of tree-sitter-rust from somewhere in tree-sitter-grep. (Maybe feature flagged.)

Ya actually I ended up doing some crate re-exporting in a recent PR (for rule-author "consumers" I guess)

Here what seems to have been the pattern is that I exposed SupportedLanguage on tree_sitter_grep and tree-sitter-lint is just re-using that as its own "language" type, and a SupportedLanguage can give you its corresponding tree_sitter::Language

helixbass · 2023-07-20T20:16:46Z

+
+use crate::{rule::ResolvedRule, violation::Violation};
+
+pub struct Context {


I was somewhat mimicking ESLint's rule-authoring API

But the concept of "context" being used here is a little fuzzy at the moment

In ESLint's API, they are calling the rule's create() callback once for each file that the rule is being run against, so it supplies a context that is specific to that file

(which I guess would enable eg tailoring that rule/its internal state to the specific file that it's being run against?)

The big things that seems questionable about mimicking that here is that we need to parse tree-sitter queries for the rule "listeners", and so doing that parsing n times (where n = the # of files we're checking) seems like not the way to do things

It's possible that that query-parsing could be eg memoized (for the common case that for each file you're registering the same exact set of tree-sitter queries for the listeners) if we did in fact want to have something like the ESLint API's "scope" (where you can basically configure the rule once per file)?

But no idea what actually makes sense

So for the moment there's this "global to the linter run"-level Context, and then a QueryMatchContext that gets supplied to your listener callback

For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?

Yup I haven't really "wired up" any "rule-driven language" stuff yet but that definitely seems right to me

helixbass · 2023-07-20T20:17:48Z

+}
+
+fn print_violation(violation: &Violation, query_match_context: &QueryMatchContext) {
+    println!(


This is presumably not as good as using something ripgrep-Printer-ish (eg might suffer from interleaving)?

helixbass · 2023-07-20T20:19:16Z

+pub fn run(_args: Args) {
+    let language = tree_sitter_rust::language();
+    let context = Context::new(language);
+    let resolved_rules = get_rules()


Rules/listeners get "resolved" by eg calling the create() callback and then actually parsing the tree-sitter queries etc

helixbass · 2023-07-20T20:21:30Z

+derive_builder = "0.12.0"
+regex = "1.9.1"
+tree-sitter = "0.20.10"
+tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" }


The dependency here is on the proof-of-concept run-context branch plus a couple of tweaks

helixbass · 2023-07-20T20:33:33Z

+                        CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT,
+                    );
+                assert!(
+                    matches!(query_text_with_unified_capture_name, Cow::Owned(_),),


This is relying on the presumed invariant that regex.replace_all() will return an "owned" (ie String) Cow if and only if it actually did some replacing

I suppose could also just string-compare that the replaced version is different than the original?

I suppose could also just string-compare that the replaced version is different than the original?

That would be much more clear to read, I think.

But more expensive? How expensive is string-equality comparison?

(As always might be being "naive about optimization guy" but) if this version is "reliable" and more performant then I'd be inclined to solve the readability issue by just eg hiding it behind a well-named helper (eg were_any_captures_replaced())

Sure, yeah, that seems fine, and maybe worth a comment about why the Cow match works.

Ok added were_any_captures_replaced() + comment on the current branch

helixbass · 2023-07-20T20:35:25Z

+                .on_query_match(|node, query_match_context| {
+                    query_match_context.report(
+                        ViolationBuilder::default()
+                            .message(r#"Use '_d()' instead of 'Default::default()'"#)


These are actual rules I thought of for the Typescript compiler in Rust code-base

Specifically here I've been playing with using a "shorthand" _d() helper vs the more verbose Default::default()

Neither of them are "flexing" the tree-sitter side of it very hard but whatever

These read pretty nicely

helixbass · 2023-07-20T20:38:35Z

+}
+
+fn no_default_default_rule() -> Rule {
+    RuleBuilder::default()


The builder pattern seemed appropriate for these but I could also see adding an (optional) layer on top of that for people who wanted a slightly "slimmer" syntax eg:

rule! { name => no_default_default, create => |context| { vec![ rule_listener! { query => r#"..."#, capture_name => "c", on_query_match => |node, query_match_context| { ... } }, ... ] } }

My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.

This theme keeps coming up of tooling "pulling up short" of really trying to parse the contents of macro invocations and consumers suffering for it (the thing I had to do for tree-sitter-grep of forking the Rust tree-sitter grammar to try and parse "typical" macro-invocation arguments; a similar thing came up here with doing syn-based traversal of parsed AST's where it "wasn't working as expected" without some extra "convincing" to traverse into macro contents; and like you're saying rustfmt not formatting macro invocation contents that it kind of seems like it should be able to format)

Ya I'm sort of into the more "magical" macro syntax for this use case of trying to make rule authorship feel "smooth" (and ESLint-like) but I'm assuming it won't be everyone's style

In which case (in terms of how the rule-authoring API's are currently organized) (a) you can always write the more boilerplate-y but "just Rust" version and/or (b) there's really no reason there couldn't be alternate macro syntaxes (whether "official" or not) (since it's just a way of generating "publicly-writable Rust code") that are maybe a bit less "magical"?

helixbass · 2023-07-20T20:40:16Z

+            on_query_match,
+        } = self;
+        let query = Query::new(context.language, &query_text).unwrap();
+        let capture_index = match capture_name {


This sort of mimics the capture-index-resolution logic from tree-sitter-grep

helixbass · 2023-07-20T20:41:13Z

+#[builder(setter(into))]
+pub struct Violation<'a> {
+    pub message: String,
+    pub node: &'a Node<'a>,


Right per other comment if tree_sitter::Node is Copy then it seems un-idiomatic to store a reference to it

Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is Node Copy, though? I'd have thought it held references to other things, which would require cloning.)

I believe it's either a type alias of or a newtype wrapper around a raw pointer to the underlying tree-sitter C API Node type (and pointers are Copy)

Ah right, and that works because the Node has an associated lifetime.

peterstuart · 2023-07-23T21:39:27Z

+
+use crate::{rule::ResolvedRule, violation::Violation};
+
+pub struct Context {


For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?

peterstuart · 2023-07-23T21:40:36Z

+        "{:?}:{}:{} {} {}",
+        query_match_context.path,
+        violation.node.range().start_point.row + 1,
+        violation.node.range().start_point.column + 1,


Maybe worth eventually making some PositionForPrinting struct that can take the 0-indexed row and column, and implements Display in some reasonable way? (Maybe not if it would need to support multiple output styles, like tree-sitter-grep does.)

Ah you're saying leverage Display for "how to formatted-output a single violation"?

Right that makes sense that "if there's one way to do it" that would be a nice thing to reach for

peterstuart · 2023-07-23T21:41:55Z

+        .into_iter()
+        .map(|rule| rule.resolve(&context))
+        .collect::<Vec<_>>();
+    let aggregated_queries = AggregatedQueries::new(&resolved_rules, &context);


Very cool that this is possible.

peterstuart · 2023-07-23T21:42:37Z

+        "-l",
+        "rust",
+        "--capture",
+        CAPTURE_NAME_FOR_TREE_SITTER_GREP,


Ha, fun that this silly command-line-style API is getting some use already.

peterstuart · 2023-07-23T21:44:22Z

+                        CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT,
+                    );
+                assert!(
+                    matches!(query_text_with_unified_capture_name, Cow::Owned(_),),


I suppose could also just string-compare that the replaced version is different than the original?

That would be much more clear to read, I think.

peterstuart · 2023-07-23T21:45:15Z

+}
+
+fn no_default_default_rule() -> Rule {
+    RuleBuilder::default()


My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.

peterstuart · 2023-07-23T23:22:01Z

+                .on_query_match(|node, query_match_context| {
+                    query_match_context.report(
+                        ViolationBuilder::default()
+                            .message(r#"Use '_d()' instead of 'Default::default()'"#)


These read pretty nicely

peterstuart · 2023-07-23T23:23:01Z

+pub struct Rule {
+    pub name: String,
+    #[builder(setter(custom))]
+    pub create: Arc<dyn Fn(&Context) -> Vec<RuleListener>>,


Ah, not sure I realized you could directly contain a dyn trait in an Arc, but that makes sense, since it's essentially a fancy Box?

Yup and Box<dyn Trait>/Rc<dyn Trait>/Arc<dyn Trait> are all "special" as far as being able to be the "self receiver type", I take advantage of that in a subsequent PR

peterstuart · 2023-07-23T23:24:09Z

+#[builder(setter(into))]
+pub struct Violation<'a> {
+    pub message: String,
+    pub node: &'a Node<'a>,


Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is Node Copy, though? I'd have thought it held references to other things, which would require cloning.)

helixbass added 8 commits July 20, 2023 12:57

working on rules

cdbfef6

resolve queries

3cb0b0f

aggregate queries

e4fc8a3

print violations

c2dfc80

rule name

913e646

print to stdout

c36fcfc

exit code

5ac4546

second rule

1b3c8da

helixbass commented Jul 20, 2023

View reviewed changes

peterstuart approved these changes Jul 23, 2023

View reviewed changes

helixbass merged commit 0028ebd into master Jul 24, 2023

helixbass deleted the rule branch July 24, 2023 01:02


		use crate::{rule::ResolvedRule, violation::Violation};

		pub struct Context {

Conversation

helixbass commented Jul 20, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone