Skip to content

Rules proof-of-concept#1

Merged
helixbass merged 8 commits into
masterfrom
rule
Jul 24, 2023
Merged

Rules proof-of-concept#1
helixbass merged 8 commits into
masterfrom
rule

Conversation

@helixbass
Copy link
Copy Markdown
Owner

In this PR:

  • set up basic types for rules/individual "listeners" within a rule etc
  • hard-code everything including the language and a couple rules

To test:
If you run this against a Rust codebase that includes Default::default() and/or lazy_static!() usages it should yell at you for them

Comment thread Cargo.toml
regex = "1.9.1"
tree-sitter = "0.20.10"
tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" }
tree-sitter-rust = "0.20.3"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it might be a little silly to have this crate depend on all of these also (along with tree-sitter-grep depending on them)?

Not sure what that would entail to eg "delegate" that dependency, maybe some exposed API from tree-sitter-grep to get the tree_sitter::Language for a given SupportedLanguage or something like that?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, or just a straight re-export of tree-sitter-rust from somewhere in tree-sitter-grep. (Maybe feature flagged.)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya actually I ended up doing some crate re-exporting in a recent PR (for rule-author "consumers" I guess)

Here what seems to have been the pattern is that I exposed SupportedLanguage on tree_sitter_grep and tree-sitter-lint is just re-using that as its own "language" type, and a SupportedLanguage can give you its corresponding tree_sitter::Language

Comment thread src/context.rs

use crate::{rule::ResolvedRule, violation::Violation};

pub struct Context {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was somewhat mimicking ESLint's rule-authoring API

But the concept of "context" being used here is a little fuzzy at the moment

In ESLint's API, they are calling the rule's create() callback once for each file that the rule is being run against, so it supplies a context that is specific to that file

(which I guess would enable eg tailoring that rule/its internal state to the specific file that it's being run against?)

The big things that seems questionable about mimicking that here is that we need to parse tree-sitter queries for the rule "listeners", and so doing that parsing n times (where n = the # of files we're checking) seems like not the way to do things

It's possible that that query-parsing could be eg memoized (for the common case that for each file you're registering the same exact set of tree-sitter queries for the listeners) if we did in fact want to have something like the ESLint API's "scope" (where you can basically configure the rule once per file)?

But no idea what actually makes sense

So for the moment there's this "global to the linter run"-level Context, and then a QueryMatchContext that gets supplied to your listener callback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I haven't really "wired up" any "rule-driven language" stuff yet but that definitely seems right to me

Comment thread src/context.rs
}

fn print_violation(violation: &Violation, query_match_context: &QueryMatchContext) {
println!(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is presumably not as good as using something ripgrep-Printer-ish (eg might suffer from interleaving)?

Comment thread src/lib.rs
pub fn run(_args: Args) {
let language = tree_sitter_rust::language();
let context = Context::new(language);
let resolved_rules = get_rules()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rules/listeners get "resolved" by eg calling the create() callback and then actually parsing the tree-sitter queries etc

Comment thread Cargo.toml
derive_builder = "0.12.0"
regex = "1.9.1"
tree-sitter = "0.20.10"
tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" }
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency here is on the proof-of-concept run-context branch plus a couple of tweaks

Comment thread src/lib.rs
CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT,
);
assert!(
matches!(query_text_with_unified_capture_name, Cow::Owned(_),),
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relying on the presumed invariant that regex.replace_all() will return an "owned" (ie String) Cow if and only if it actually did some replacing

I suppose could also just string-compare that the replaced version is different than the original?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose could also just string-compare that the replaced version is different than the original?

That would be much more clear to read, I think.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But more expensive? How expensive is string-equality comparison?

(As always might be being "naive about optimization guy" but) if this version is "reliable" and more performant then I'd be inclined to solve the readability issue by just eg hiding it behind a well-named helper (eg were_any_captures_replaced())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, yeah, that seems fine, and maybe worth a comment about why the Cow match works.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok added were_any_captures_replaced() + comment on the current branch

Comment thread src/lib.rs
.on_query_match(|node, query_match_context| {
query_match_context.report(
ViolationBuilder::default()
.message(r#"Use '_d()' instead of 'Default::default()'"#)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actual rules I thought of for the Typescript compiler in Rust code-base

Specifically here I've been playing with using a "shorthand" _d() helper vs the more verbose Default::default()

Neither of them are "flexing" the tree-sitter side of it very hard but whatever

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These read pretty nicely

Comment thread src/lib.rs
}

fn no_default_default_rule() -> Rule {
RuleBuilder::default()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builder pattern seemed appropriate for these but I could also see adding an (optional) layer on top of that for people who wanted a slightly "slimmer" syntax eg:

rule! {
    name => no_default_default,
    create => |context| {
        vec![
            rule_listener! {
                query => r#"..."#,
                capture_name => "c",
                on_query_match => |node, query_match_context| {
                    ...
                }
            },
            ...
        ]
    }
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This theme keeps coming up of tooling "pulling up short" of really trying to parse the contents of macro invocations and consumers suffering for it (the thing I had to do for tree-sitter-grep of forking the Rust tree-sitter grammar to try and parse "typical" macro-invocation arguments; a similar thing came up here with doing syn-based traversal of parsed AST's where it "wasn't working as expected" without some extra "convincing" to traverse into macro contents; and like you're saying rustfmt not formatting macro invocation contents that it kind of seems like it should be able to format)

Ya I'm sort of into the more "magical" macro syntax for this use case of trying to make rule authorship feel "smooth" (and ESLint-like) but I'm assuming it won't be everyone's style

In which case (in terms of how the rule-authoring API's are currently organized) (a) you can always write the more boilerplate-y but "just Rust" version and/or (b) there's really no reason there couldn't be alternate macro syntaxes (whether "official" or not) (since it's just a way of generating "publicly-writable Rust code") that are maybe a bit less "magical"?

Comment thread src/rule.rs
on_query_match,
} = self;
let query = Query::new(context.language, &query_text).unwrap();
let capture_index = match capture_name {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of mimics the capture-index-resolution logic from tree-sitter-grep

Comment thread src/violation.rs
#[builder(setter(into))]
pub struct Violation<'a> {
pub message: String,
pub node: &'a Node<'a>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right per other comment if tree_sitter::Node is Copy then it seems un-idiomatic to store a reference to it

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is Node Copy, though? I'd have thought it held references to other things, which would require cloning.)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's either a type alias of or a newtype wrapper around a raw pointer to the underlying tree-sitter C API Node type (and pointers are Copy)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, and that works because the Node has an associated lifetime.

Comment thread src/context.rs

use crate::{rule::ResolvedRule, violation::Violation};

pub struct Context {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?

Comment thread src/context.rs
"{:?}:{}:{} {} {}",
query_match_context.path,
violation.node.range().start_point.row + 1,
violation.node.range().start_point.column + 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth eventually making some PositionForPrinting struct that can take the 0-indexed row and column, and implements Display in some reasonable way? (Maybe not if it would need to support multiple output styles, like tree-sitter-grep does.)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're saying leverage Display for "how to formatted-output a single violation"?

Right that makes sense that "if there's one way to do it" that would be a nice thing to reach for

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment thread src/lib.rs
.into_iter()
.map(|rule| rule.resolve(&context))
.collect::<Vec<_>>();
let aggregated_queries = AggregatedQueries::new(&resolved_rules, &context);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool that this is possible.

Comment thread src/lib.rs
"-l",
"rust",
"--capture",
CAPTURE_NAME_FOR_TREE_SITTER_GREP,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, fun that this silly command-line-style API is getting some use already.

Comment thread src/lib.rs
CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT,
);
assert!(
matches!(query_text_with_unified_capture_name, Cow::Owned(_),),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose could also just string-compare that the replaced version is different than the original?

That would be much more clear to read, I think.

Comment thread src/lib.rs
}

fn no_default_default_rule() -> Rule {
RuleBuilder::default()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.

Comment thread src/lib.rs
.on_query_match(|node, query_match_context| {
query_match_context.report(
ViolationBuilder::default()
.message(r#"Use '_d()' instead of 'Default::default()'"#)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These read pretty nicely

Comment thread src/rule.rs
pub struct Rule {
pub name: String,
#[builder(setter(custom))]
pub create: Arc<dyn Fn(&Context) -> Vec<RuleListener>>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, not sure I realized you could directly contain a dyn trait in an Arc, but that makes sense, since it's essentially a fancy Box?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup and Box<dyn Trait>/Rc<dyn Trait>/Arc<dyn Trait> are all "special" as far as being able to be the "self receiver type", I take advantage of that in a subsequent PR

Comment thread src/violation.rs
#[builder(setter(into))]
pub struct Violation<'a> {
pub message: String,
pub node: &'a Node<'a>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is Node Copy, though? I'd have thought it held references to other things, which would require cloning.)

@helixbass helixbass merged commit 0028ebd into master Jul 24, 2023
@helixbass helixbass deleted the rule branch July 24, 2023 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants