[airflow] Add mixed-content support to AIR201#24673
[airflow] Add mixed-content support to AIR201#24673Dev-iL wants to merge 1 commit intoastral-sh:mainfrom
airflow] Add mixed-content support to AIR201#24673Conversation
|
Thanks for working on this. How common are the |
|
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| AIR201 | 82 | 82 | 0 | 0 | 0 |
|
|
A string like this is quite common in an airflow use case. It might appear elsewhere, but we decide it's not worth catching them. That would cause more trouble than the benefit we can get. By limiting it only to the string passed to Airflow, I think we should be safe |
944bfe8 to
fd7e6fe
Compare
|
I'm not sure the extension is in the spirit of the rule. Specifically
I understand that this is only achieved by removing the Jinja string in its entirety, which doesn't seem to be the case for most fixes. To me, this change extends the rule with examples on how to simplify the |
Do you propose to move these to a separate rule? |
|
I think that's something we have to figure out. We either need to change the scope of the existing rule and update its motivation or have a separate rule. To help with this, it's important to know what you want to catch with this rule long-term. Are there cases where users might only want to lint for one and not the other? Are the motivations different? I think that's something that the airflow community must answer first. |
I can't think of a case like that.
Motivations - no; practical benefits - yes: when replacing a string by a string, although we get code that's more concise, we get no benefits to refactoring and IDE support. I think the addition doesn't have a sufficient raison d'être as an individual rule.
Agreed. Looking back, the proposal approved by the community explicitly excluded mixed-content strings. |
|
I think this will need input from @Lee-W |
|
IMO, the main benefit is changing |
|
That would mean we should rephrase the rules motivation and remove the editor part entirely. |
Agree we should rephrase the motivational rules. The editor's section can still be included but should be emphasized less? |
Possibly. I'm still wondering if two separate rules would be preferred. |
I kinda feel AIR201 should work with both the Jinja thing and mixed-content. And the pure Jinja replacement can be another rule (since we're only handling taskflow now) @Dev-iL WDYT? |
What makes you say that? Traditional operators are affected by the existing AIR201 too. Do you mean in the sense that the replacement creates relations between tasks making (some) |
|
we're not handling all The first is the jinja removal thing I'm talking about.
Wrong wording 🫠 What I meant is the What I'm thinking for AIR201 is the
hmm... ok then i'm wrong about this one... then, i guess we'll need to just keep 201 as it was without this change? |
This suggestion made me think: the current AIR201 depends on AIR001, otherwise a replacement might break a Dag. Replacing other templated variables (i.e. non-output) seems even more brittle since those variables are less likely to be in scope, and we have no control over what the user decides to name them. So for a non-
Agreed.
Yeah, without this change. However, I think we should include further syntax modernization going live in Airflow 3.3: |
yep, that jinja thing was never something in my mind till this discussion happened
Sounds like a good idea to me. I was suprised it did not work that way |
Summary
AIR201 previously only flagged template strings whose entire value is a jinja template
{{ ti.xcom_pull(...) }}, leaving strings like"echo {{ ti.xcom_pull('task_1') }}"ignored. This PR:xcom_pullcalls embedded within larger strings ("mixed-content"), while keeping the pure-template path intact.kebab-task,group_1.task_1) are skipped.The need for these additions became apparent in the context of apache/airflow#65197.
Future work includes handling grouped task (
group_id.task_id) replacement (after apache/airflow#64430 is merged).Important
Disclosure: content below this point was AI-generated, with minor manual tweaks
Examples
Pure-template (unchanged behaviour)
Mixed-content (new)
Multiple occurrences in a single string each get their own independent diagnostic and fix:
Trailing subscript access within the Jinja block
Test Plan
Added 19 new test cases:
task_10,task_mc1–task_mc6task_ids=,task_id=(singular), list-wrapped, tuple-wrapped,key='return_value', andtask_instancereceivertask_mc7task_mc11task_mc8['item']and{{ }}task_mc9task_mc10task_nt1task_nt2task_nt3group_1.task_1) — no violationtask_nt4as_string_literal_expr()(AST type isExprFString, notExprStringLiteral); runtime value would be{{ ti.xcom_pull('task_1') }}but task ID is only known at runtime