-
Notifications
You must be signed in to change notification settings - Fork 0
Rules proof-of-concept #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cdbfef6
3cb0b0f
e4fc8a3
c2dfc80
913e646
c36fcfc
5ac4546
1b3c8da
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,3 +6,12 @@ edition = "2021" | |
| # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
|
|
||
| [dependencies] | ||
| clap = "4.3.17" | ||
| derive_builder = "0.12.0" | ||
| regex = "1.9.1" | ||
| tree-sitter = "0.20.10" | ||
| tree-sitter-grep = { git = "https://github.com/helixbass/tree-sitter-grep", rev = "3d4682c" } | ||
| tree-sitter-rust = "0.20.3" | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like it might be a little silly to have this crate depend on all of these also (along with Not sure what that would entail to eg "delegate" that dependency, maybe some exposed API from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, or just a straight re-export of
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ya actually I ended up doing some crate re-exporting in a recent PR (for rule-author "consumers" I guess) Here what seems to have been the pattern is that I exposed |
||
|
|
||
| [[bin]] | ||
| name = "tree-sitter-lint" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| use clap::Parser; | ||
|
|
||
| #[derive(Parser)] | ||
| pub struct Args {} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| use clap::Parser; | ||
| use tree_sitter_lint::{run, Args}; | ||
|
|
||
| fn main() { | ||
| let args = Args::parse(); | ||
| run(args); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| use std::{ | ||
| path::Path, | ||
| sync::atomic::{AtomicBool, Ordering}, | ||
| }; | ||
|
|
||
| use tree_sitter::Language; | ||
|
|
||
| use crate::{rule::ResolvedRule, violation::Violation}; | ||
|
|
||
| pub struct Context { | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was somewhat mimicking ESLint's rule-authoring API But the concept of "context" being used here is a little fuzzy at the moment In ESLint's API, they are calling the rule's (which I guess would enable eg tailoring that rule/its internal state to the specific file that it's being run against?) The big things that seems questionable about mimicking that here is that we need to parse tree-sitter queries for the rule "listeners", and so doing that parsing n times (where n = the # of files we're checking) seems like not the way to do things It's possible that that query-parsing could be eg memoized (for the common case that for each file you're registering the same exact set of tree-sitter queries for the listeners) if we did in fact want to have something like the ESLint API's "scope" (where you can basically configure the rule once per file)? But no idea what actually makes sense So for the moment there's this "global to the linter run"-level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For linting more-so than tree-sitter-grep, it seems very reasonable for the rules to have to say which language(s?) they operate on, so the queries could be parsed up front once, maybe?
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup I haven't really "wired up" any "rule-driven language" stuff yet but that definitely seems right to me |
||
| pub language: Language, | ||
| } | ||
|
|
||
| impl Context { | ||
| pub fn new(language: Language) -> Self { | ||
| Self { language } | ||
| } | ||
| } | ||
|
|
||
| pub struct QueryMatchContext<'a> { | ||
| pub path: &'a Path, | ||
| pub file_contents: &'a [u8], | ||
| pub rule: &'a ResolvedRule<'a>, | ||
| reported_any_violations: &'a AtomicBool, | ||
| } | ||
|
|
||
| impl<'a> QueryMatchContext<'a> { | ||
| pub fn new( | ||
| path: &'a Path, | ||
| file_contents: &'a [u8], | ||
| rule: &'a ResolvedRule, | ||
| reported_any_violations: &'a AtomicBool, | ||
| ) -> Self { | ||
| Self { | ||
| path, | ||
| file_contents, | ||
| rule, | ||
| reported_any_violations, | ||
| } | ||
| } | ||
|
|
||
| pub fn report(&self, violation: Violation) { | ||
| self.reported_any_violations.store(true, Ordering::Relaxed); | ||
| print_violation(&violation, self); | ||
| } | ||
| } | ||
|
|
||
| fn print_violation(violation: &Violation, query_match_context: &QueryMatchContext) { | ||
| println!( | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is presumably not as good as using something |
||
| "{:?}:{}:{} {} {}", | ||
| query_match_context.path, | ||
| violation.node.range().start_point.row + 1, | ||
| violation.node.range().start_point.column + 1, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe worth eventually making some
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah you're saying leverage Right that makes sense that "if there's one way to do it" that would be a nice thing to reach for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏻 |
||
| violation.message, | ||
| query_match_context.rule.name, | ||
| ); | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,14 +1,180 @@ | ||
| pub fn add(left: usize, right: usize) -> usize { | ||
| left + right | ||
| mod args; | ||
| mod context; | ||
| mod rule; | ||
| mod violation; | ||
|
|
||
| use std::{ | ||
| borrow::Cow, | ||
| process, | ||
| sync::atomic::{AtomicBool, Ordering}, | ||
| }; | ||
|
|
||
| pub use args::Args; | ||
| use clap::Parser; | ||
| use context::QueryMatchContext; | ||
| use rule::{ResolvedRule, Rule, RuleBuilder, RuleListenerBuilder}; | ||
| use tree_sitter::Query; | ||
| use violation::ViolationBuilder; | ||
|
|
||
| use crate::context::Context; | ||
|
|
||
| #[macro_export] | ||
| macro_rules! regex { | ||
| ($re:expr $(,)?) => {{ | ||
| static RE: std::sync::OnceLock<regex::Regex> = std::sync::OnceLock::new(); | ||
| RE.get_or_init(|| regex::Regex::new($re).unwrap()) | ||
| }}; | ||
| } | ||
|
|
||
| const CAPTURE_NAME_FOR_TREE_SITTER_GREP: &str = "_tree_sitter_lint_capture"; | ||
| const CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT: &str = "@_tree_sitter_lint_capture"; | ||
|
|
||
| pub fn run(_args: Args) { | ||
| let language = tree_sitter_rust::language(); | ||
| let context = Context::new(language); | ||
| let resolved_rules = get_rules() | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rules/listeners get "resolved" by eg calling the |
||
| .into_iter() | ||
| .map(|rule| rule.resolve(&context)) | ||
| .collect::<Vec<_>>(); | ||
| let aggregated_queries = AggregatedQueries::new(&resolved_rules, &context); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very cool that this is possible. |
||
| let tree_sitter_grep_args = tree_sitter_grep::Args::parse_from([ | ||
| "tree_sitter_grep", | ||
| "-q", | ||
| &aggregated_queries.query_text, | ||
| "-l", | ||
| "rust", | ||
| "--capture", | ||
| CAPTURE_NAME_FOR_TREE_SITTER_GREP, | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd thought we would need to make So just had to make the one change to the crate-level API to additionally expose the "pattern index" of the matching capture (which is what lets us do the "de-multiplexing") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ha, fun that this silly command-line-style API is getting some use already.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ya this is definitely a good "driver" for kicking the tires on the |
||
| ]); | ||
| let reported_any_violations = AtomicBool::new(false); | ||
| tree_sitter_grep::run_with_callback( | ||
| tree_sitter_grep_args, | ||
| |capture_info, file_contents, path| { | ||
| let (rule_index, rule_listener_index) = | ||
| aggregated_queries.pattern_index_lookup[capture_info.pattern_index]; | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we de-multiplex to resolve the This general approach (of just running a single aggregated tree-sitter query (of all of the individual listener queries) once and letting And it does appear to behave that way (I did a little separate sanity-check of that) At that point this seems pretty efficient, we just crunch through the tree-sitter AST once per file |
||
| let rule = &resolved_rules[rule_index]; | ||
| let listener = &rule.listeners[rule_listener_index]; | ||
| (listener.on_query_match)( | ||
| &capture_info.node, | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks silly now that I saw that |
||
| &QueryMatchContext::new(path, file_contents, rule, &reported_any_violations), | ||
| ); | ||
| }, | ||
| ) | ||
| .unwrap(); | ||
| if reported_any_violations.load(Ordering::Relaxed) { | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory this should be reporting success or failure exit status code depending on whether it found any violations (which is what I assume you want for eg CI?) but I didn't actually check that it was working |
||
| process::exit(1); | ||
| } else { | ||
| process::exit(0); | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
| type RuleIndex = usize; | ||
| type RuleListenerIndex = usize; | ||
|
|
||
| #[test] | ||
| fn it_works() { | ||
| let result = add(2, 2); | ||
| assert_eq!(result, 4); | ||
| struct AggregatedQueries { | ||
| pattern_index_lookup: Vec<(RuleIndex, RuleListenerIndex)>, | ||
| #[allow(dead_code)] | ||
| query: Query, | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we at least want to parse this query "on our end" in order to eg be able to do the sanity-check assertion below But then it seems like |
||
| query_text: String, | ||
| } | ||
|
|
||
| impl AggregatedQueries { | ||
| pub fn new(resolved_rules: &[ResolvedRule], context: &Context) -> Self { | ||
| let mut pattern_index_lookup: Vec<(RuleIndex, RuleListenerIndex)> = Default::default(); | ||
| let mut aggregated_query_text = String::new(); | ||
| for (rule_index, resolved_rule) in resolved_rules.into_iter().enumerate() { | ||
| for (rule_listener_index, rule_listener) in resolved_rule.listeners.iter().enumerate() { | ||
| for _ in 0..rule_listener.query.pattern_count() { | ||
| pattern_index_lookup.push((rule_index, rule_listener_index)); | ||
| } | ||
| let use_capture_name = | ||
| &rule_listener.query.capture_names()[rule_listener.capture_index as usize]; | ||
| let query_text_with_unified_capture_name = | ||
| regex!(&format!(r#"@{use_capture_name}\b"#)).replace_all( | ||
| &rule_listener.query_text, | ||
| CAPTURE_NAME_FOR_TREE_SITTER_GREP_WITH_LEADING_AT, | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doing regex-based replacing of the "target capture name" so that the "one big query" has a single capture name that we're interested in |
||
| ); | ||
| assert!( | ||
| matches!(query_text_with_unified_capture_name, Cow::Owned(_),), | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is relying on the presumed invariant that I suppose could also just string-compare that the replaced version is different than the original? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That would be much more clear to read, I think.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But more expensive? How expensive is string-equality comparison? (As always might be being "naive about optimization guy" but) if this version is "reliable" and more performant then I'd be inclined to solve the readability issue by just eg hiding it behind a well-named helper (eg There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, yeah, that seems fine, and maybe worth a comment about why the
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok added |
||
| "Didn't find any instances of the capture name to replace" | ||
| ); | ||
| aggregated_query_text.push_str(&query_text_with_unified_capture_name); | ||
| aggregated_query_text.push_str("\n\n"); | ||
| } | ||
| } | ||
| let query = Query::new(context.language, &aggregated_query_text).unwrap(); | ||
| assert!(query.pattern_count() == pattern_index_lookup.len()); | ||
| Self { | ||
| pattern_index_lookup, | ||
| query, | ||
| query_text: aggregated_query_text, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| fn get_rules() -> Vec<Rule> { | ||
| vec![no_default_default_rule(), no_lazy_static_rule()] | ||
| } | ||
|
|
||
| fn no_default_default_rule() -> Rule { | ||
| RuleBuilder::default() | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The builder pattern seemed appropriate for these but I could also see adding an (optional) layer on top of that for people who wanted a slightly "slimmer" syntax eg: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My main issue with macros like that is that you have to (I think in this case) manage indentation/whitespace yourself.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This theme keeps coming up of tooling "pulling up short" of really trying to parse the contents of macro invocations and consumers suffering for it (the thing I had to do for Ya I'm sort of into the more "magical" macro syntax for this use case of trying to make rule authorship feel "smooth" (and ESLint-like) but I'm assuming it won't be everyone's style In which case (in terms of how the rule-authoring API's are currently organized) (a) you can always write the more boilerplate-y but "just Rust" version and/or (b) there's really no reason there couldn't be alternate macro syntaxes (whether "official" or not) (since it's just a way of generating "publicly-writable Rust code") that are maybe a bit less "magical"? |
||
| .name("no_default_default") | ||
| .create(|_context| { | ||
| vec![RuleListenerBuilder::default() | ||
| .query( | ||
| r#"( | ||
| (call_expression | ||
| function: | ||
| (scoped_identifier | ||
| path: | ||
| (identifier) @first (#eq? @first "Default") | ||
| name: | ||
| (identifier) @second (#eq? @second "default") | ||
| ) | ||
| ) @c | ||
| )"#, | ||
| ) | ||
| .capture_name("c") | ||
| .on_query_match(|node, query_match_context| { | ||
| query_match_context.report( | ||
| ViolationBuilder::default() | ||
| .message(r#"Use '_d()' instead of 'Default::default()'"#) | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are actual rules I thought of for the Typescript compiler in Rust code-base Specifically here I've been playing with using a "shorthand" Neither of them are "flexing" the tree-sitter side of it very hard but whatever There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These read pretty nicely |
||
| .node(node) | ||
| .build() | ||
| .unwrap(), | ||
| ); | ||
| }) | ||
| .build() | ||
| .unwrap()] | ||
| }) | ||
| .build() | ||
| .unwrap() | ||
| } | ||
|
|
||
| fn no_lazy_static_rule() -> Rule { | ||
| RuleBuilder::default() | ||
| .name("no_lazy_static") | ||
| .create(|_context| { | ||
| vec![RuleListenerBuilder::default() | ||
| .query( | ||
| r#"( | ||
| (macro_invocation | ||
| macro: (identifier) @c (#eq? @c "lazy_static") | ||
| ) | ||
| )"#, | ||
| ) | ||
| .on_query_match(|node, query_match_context| { | ||
| query_match_context.report( | ||
| ViolationBuilder::default() | ||
| .message(r#"Prefer 'OnceCell::*::Lazy' to 'lazy_static!()'"#) | ||
| .node(node) | ||
| .build() | ||
| .unwrap(), | ||
| ); | ||
| }) | ||
| .build() | ||
| .unwrap()] | ||
| }) | ||
| .build() | ||
| .unwrap() | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| use std::sync::Arc; | ||
|
|
||
| use derive_builder::Builder; | ||
| use tree_sitter::{Node, Query}; | ||
|
|
||
| use crate::context::{Context, QueryMatchContext}; | ||
|
|
||
| #[derive(Builder)] | ||
| #[builder(setter(into))] | ||
| pub struct Rule { | ||
| pub name: String, | ||
| #[builder(setter(custom))] | ||
| pub create: Arc<dyn Fn(&Context) -> Vec<RuleListener>>, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, not sure I realized you could directly contain a
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup and |
||
| } | ||
|
|
||
| impl Rule { | ||
| pub fn resolve(self, context: &Context) -> ResolvedRule<'_> { | ||
| let Rule { name, create } = self; | ||
|
|
||
| ResolvedRule::new( | ||
| name, | ||
| create(context) | ||
| .into_iter() | ||
| .map(|rule_listener| rule_listener.resolve(context)) | ||
| .collect(), | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| impl RuleBuilder { | ||
| pub fn create( | ||
| &mut self, | ||
| callback: impl Fn(&Context) -> Vec<RuleListener> + 'static, | ||
| ) -> &mut Self { | ||
| self.create = Some(Arc::new(callback)); | ||
| self | ||
| } | ||
| } | ||
|
|
||
| pub struct ResolvedRule<'context> { | ||
| pub name: String, | ||
| pub listeners: Vec<ResolvedRuleListener<'context>>, | ||
| } | ||
|
|
||
| impl<'context> ResolvedRule<'context> { | ||
| pub fn new(name: String, listeners: Vec<ResolvedRuleListener<'context>>) -> Self { | ||
| Self { name, listeners } | ||
| } | ||
| } | ||
|
|
||
| #[derive(Builder)] | ||
| #[builder(setter(into, strip_option))] | ||
| pub struct RuleListener<'on_query_match> { | ||
| pub query: String, | ||
| #[builder(default)] | ||
| pub capture_name: Option<String>, | ||
| #[builder(setter(custom))] | ||
| pub on_query_match: Arc<dyn Fn(&Node, &QueryMatchContext) + 'on_query_match + Send + Sync>, | ||
| } | ||
|
|
||
| impl<'on_query_match> RuleListener<'on_query_match> { | ||
| pub fn resolve(self, context: &Context) -> ResolvedRuleListener<'on_query_match> { | ||
| let RuleListener { | ||
| query: query_text, | ||
| capture_name, | ||
| on_query_match, | ||
| } = self; | ||
| let query = Query::new(context.language, &query_text).unwrap(); | ||
| let capture_index = match capture_name { | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sort of mimics the capture-index-resolution logic from |
||
| None => match query.capture_names().len() { | ||
| 0 => panic!("Expected capture"), | ||
| _ => 0, | ||
| }, | ||
| Some(capture_name) => query.capture_index_for_name(&capture_name).unwrap(), | ||
| }; | ||
| ResolvedRuleListener { | ||
| query, | ||
| query_text, | ||
| capture_index, | ||
| on_query_match, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl<'on_query_match> RuleListenerBuilder<'on_query_match> { | ||
| pub fn on_query_match( | ||
| &mut self, | ||
| callback: impl Fn(&Node, &QueryMatchContext) + 'on_query_match + Send + Sync, | ||
| ) -> &mut Self { | ||
| self.on_query_match = Some(Arc::new(callback)); | ||
| self | ||
| } | ||
| } | ||
|
|
||
| pub struct ResolvedRuleListener<'on_query_match> { | ||
| pub query: Query, | ||
| pub query_text: String, | ||
| pub capture_index: u32, | ||
| pub on_query_match: Arc<dyn Fn(&Node, &QueryMatchContext) + 'on_query_match + Send + Sync>, | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| use derive_builder::Builder; | ||
| use tree_sitter::Node; | ||
|
|
||
| #[derive(Builder)] | ||
| #[builder(setter(into))] | ||
| pub struct Violation<'a> { | ||
| pub message: String, | ||
| pub node: &'a Node<'a>, | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right per other comment if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, and looks like it'd save you a little lifetime pain to not do it? (How is
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it's either a type alias of or a newtype wrapper around a raw pointer to the underlying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah right, and that works because the |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dependency here is on the proof-of-concept
run-contextbranch plus a couple of tweaks