Skip to content

[airflow] Add mixed-content support to AIR201#24673

Draft
Dev-iL wants to merge 1 commit intoastral-sh:mainfrom
Dev-iL:airflow/AIR201_mixed_context
Draft

[airflow] Add mixed-content support to AIR201#24673
Dev-iL wants to merge 1 commit intoastral-sh:mainfrom
Dev-iL:airflow/AIR201_mixed_context

Conversation

@Dev-iL
Copy link
Copy Markdown
Contributor

@Dev-iL Dev-iL commented Apr 16, 2026

Summary

AIR201 previously only flagged template strings whose entire value is a jinja template {{ ti.xcom_pull(...) }}, leaving strings like "echo {{ ti.xcom_pull('task_1') }}" ignored. This PR:

  1. Extends the rule to detect xcom_pull calls embedded within larger strings ("mixed-content"), while keeping the pure-template path intact.
  2. Adds identifier validation so patterns whose task ID is not a valid Python identifier (e.g. kebab-task, group_1.task_1) are skipped.

The need for these additions became apparent in the context of apache/airflow#65197.

Future work includes handling grouped task (group_id.task_id) replacement (after apache/airflow#64430 is merged).

Important

Disclosure: content below this point was AI-generated, with minor manual tweaks

Examples

  • Pure-template (unchanged behaviour)

    # Before — flagged and fixed as before
    bash_command="{{ ti.xcom_pull(task_ids='task_1') }}"
    # Fix replaces the whole argument:
    bash_command=task_1.output
  • Mixed-content (new)

    # Before — not flagged; now detected
    bash_command="echo {{ ti.xcom_pull(task_ids='task_1') }}"
    # Fix replaces only the xcom_pull call in-place, preserving surrounding content:
    bash_command="echo {{ task_1.output }}"
  • Multiple occurrences in a single string each get their own independent diagnostic and fix:

    bash_command="{{ ti.xcom_pull('task_1') }} and {{ ti.xcom_pull('task_2') }}"
    # Two AIR201 violations; applied independently:
    bash_command="{{ task_1.output }} and {{ task_2.output }}"
  • Trailing subscript access within the Jinja block

    bash_command="{{ ti.xcom_pull('task_1')['item'] }}"
    # Fix:
    bash_command="{{ task_1.output['item'] }}"

Test Plan

Added 19 new test cases:

Case Description
task_10, task_mc1task_mc6 Mixed-content, one occurrence — covers positional arg, task_ids=, task_id= (singular), list-wrapped, tuple-wrapped, key='return_value', and task_instance receiver
task_mc7 Two occurrences of the same task ID in one string — two independent fixes
task_mc11 Two occurrences of different task IDs in one string — two independent fixes with distinct replacements
task_mc8 Subscript access within the Jinja block — fix preserves ['item'] and {{ }}
task_mc9 Triple-quoted string — scanner is quote-agnostic
task_mc10 Mixed-content with unknown task ID — violation reported, no fix available
task_nt1 Kebab-case task ID, pure-template form — no violation
task_nt2 Kebab-case task ID, mixed-content form — no violation
task_nt3 Dotted group task ID (group_1.task_1) — no violation
task_nt4 F-string argument — excluded by as_string_literal_expr() (AST type is ExprFString, not ExprStringLiteral); runtime value would be {{ ti.xcom_pull('task_1') }} but task ID is only known at runtime

@astral-sh-bot astral-sh-bot Bot requested a review from ntBre April 16, 2026 11:10
@MichaReiser
Copy link
Copy Markdown
Member

Thanks for working on this. How common are the echo cases? It adds a fair amount of complexity which I'm not sure is warranted if this is only used rarely

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 16, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+82 -0 violations, +0 -0 fixes in 1 projects; 55 projects unchanged)

apache/airflow (+82 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select ALL

+ providers/amazon/tests/system/amazon/aws/example_http_to_s3.py:113:22: AIR201 Use the `.output` attribute on the task object for "start_server" instead of `xcom_pull` in a template string
+ providers/amazon/tests/system/amazon/aws/example_mwaa.py:143:29: AIR201 Use the `.output` attribute on the task object for "trigger_dag_run" instead of `xcom_pull` in a template string
+ providers/amazon/tests/system/amazon/aws/example_mwaa_airflow2.py:138:29: AIR201 Use the `.output` attribute on the task object for "trigger_dag_run" instead of `xcom_pull` in a template string
+ providers/amazon/tests/system/amazon/aws/example_sagemaker_condition.py:115:20: AIR201 Use the `.output` attribute on the task object for "produce_bad_metrics" instead of `xcom_pull` in a template string
+ providers/apache/beam/tests/system/apache/beam/example_go_dataflow.py:69:16: AIR201 Use the `.output` attribute on the task object for "start_go_job_dataflow_runner_async" instead of `xcom_pull` in a template string
+ providers/apache/beam/tests/system/apache/beam/example_python_dataflow.py:72:16: AIR201 Use the `.output` attribute on the task object for "start_python_job_dataflow_runner_async" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:103:20: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:61:20: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:62:18: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:69:20: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:77:20: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:78:18: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/apache/kylin/tests/system/apache/kylin/example_kylin_dag.py:85:20: AIR201 Use the `.output` attribute on the task object for "gen_build_time" instead of `xcom_pull` in a template string
+ providers/cncf/kubernetes/tests/system/cncf/kubernetes/example_spark_kubernetes.py:73:26: AIR201 Use the `.output` attribute on the task object for "spark_pi_submit" instead of `xcom_pull` in a template string
+ providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llm_survey_agentic.py:248:16: AIR201 Use the `.output` attribute on the task object for "collect_results" instead of `xcom_pull` in a template string
+ providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llm_survey_analysis.py:187:16: AIR201 Use the `.output` attribute on the task object for "prompt_confirmation" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/data_loss_prevention/example_dlp_deidentify_content.py:142:34: AIR201 Use the `.output` attribute on the task object for "create_template" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/data_loss_prevention/example_dlp_inspect_template.py:101:31: AIR201 Use the `.output` attribute on the task object for "create_template" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/data_loss_prevention/example_dlp_job.py:80:20: AIR201 Use the `.output` attribute on the task object for "create_job" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/data_loss_prevention/example_dlp_job.py:86:20: AIR201 Use the `.output` attribute on the task object for "create_job" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/data_loss_prevention/example_dlp_job.py:92:20: AIR201 Use the `.output` attribute on the task object for "create_job" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_go.py:102:16: AIR201 Use the `.output` attribute on the task object for "start_go_pipeline_dataflow_runner" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_go.py:116:16: AIR201 Use the `.output` attribute on the task object for "start_go_pipeline_dataflow_runner" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_go.py:131:16: AIR201 Use the `.output` attribute on the task object for "start_go_pipeline_dataflow_runner" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_java_streaming.py:142:16: AIR201 Use the `.output` attribute on the task object for "start_java_streaming_dataflow_job" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_native_python_async.py:103:16: AIR201 Use the `.output` attribute on the task object for "start_python_job_async" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_native_python_async.py:127:16: AIR201 Use the `.output` attribute on the task object for "start_python_job_async" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_native_python_async.py:144:16: AIR201 Use the `.output` attribute on the task object for "start_python_job_async" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_native_python_async.py:161:16: AIR201 Use the `.output` attribute on the task object for "start_python_job_async" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_sensors_deferrable.py:123:16: AIR201 Use the `.output` attribute on the task object for "start_beam_python_pipeline" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_sensors_deferrable.py:141:16: AIR201 Use the `.output` attribute on the task object for "start_beam_python_pipeline" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_sensors_deferrable.py:159:16: AIR201 Use the `.output` attribute on the task object for "start_beam_python_pipeline" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_sensors_deferrable.py:98:16: AIR201 Use the `.output` attribute on the task object for "start_beam_python_pipeline" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataflow/example_dataflow_streaming_python.py:128:16: AIR201 Use the `.output` attribute on the task object for "start_streaming_python_job" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataprep/example_dataprep.py:230:22: AIR201 Use the `.output` attribute on the task object for "run_flow" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataprep/example_dataprep.py:240:22: AIR201 Use the `.output` attribute on the task object for "run_flow" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataprep/example_dataprep.py:248:22: AIR201 Use the `.output` attribute on the task object for "run_flow" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataprep/example_dataprep.py:256:22: AIR201 Use the `.output` attribute on the task object for "run_job_group" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataprep/example_dataprep.py:264:17: AIR201 Use the `.output` attribute on the task object for "copy_flow" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataprep/example_dataprep.py:272:17: AIR201 Use the `.output` attribute on the task object for "create_flow" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/dataproc/example_dataproc_batch.py:143:24: AIR201 Use the `.output` attribute on the task object for "create_batch_4" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/datastore/example_datastore_commit.py:124:16: AIR201 Use the `.output` attribute on the task object for "export_task" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/datastore/example_datastore_commit.py:125:14: AIR201 Use the `.output` attribute on the task object for "export_task" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/datastore/example_datastore_commit.py:132:39: AIR201 Use the `.output` attribute on the task object for "export_task" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/datastore/example_datastore_commit.py:139:14: AIR201 Use the `.output` attribute on the task object for "export_task" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/datastore/example_datastore_commit.py:146:14: AIR201 Use the `.output` attribute on the task object for "import_task" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/kubernetes_engine/example_kubernetes_engine.py:105:22: AIR201 Use the `.output` attribute on the task object for "pod_task_xcom" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/kubernetes_engine/example_kubernetes_engine.py:105:22: AIR201 Use the `.output` attribute on the task object for "pod_task_xcom" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/kubernetes_engine/example_kubernetes_engine_async.py:108:22: AIR201 Use the `.output` attribute on the task object for "pod_task_xcom_async" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/kubernetes_engine/example_kubernetes_engine_async.py:108:22: AIR201 Use the `.output` attribute on the task object for "pod_task_xcom_async" instead of `xcom_pull` in a template string
... 32 additional changes omitted for project

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
AIR201 82 82 0 0 0

@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Apr 16, 2026

Thanks for working on this. How common are the echo cases? It adds a fair amount of complexity which I'm not sure is warranted if this is only used rarely

echo is just an example for a string where the jinja pattern is a substring. If you compare the ecosystem scan results from this PR to the original AIR201 PR, you'll notice that the majority of uses in the airflow codebase follow the "echo" pattern (82 vs 3).

The complexity is warranted. However, even now it doesn't catch all cases found using the \{\{(\s?)(?:task_instance|ti)\.xcom_pull\('([^']+)'\) regex in apache/airflow#65197. Disregard this, @Lee-W and I have decided that only strings passed to airflow objects should be matched by the rule. The missing ~14 matches are string constants intended for lazy template evaluation.

@ntBre ntBre added rule Implementing or modifying a lint rule preview Related to preview mode features labels Apr 16, 2026
@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented Apr 17, 2026

A string like this is quite common in an airflow use case. It might appear elsewhere, but we decide it's not worth catching them. That would cause more trouble than the benefit we can get. By limiting it only to the string passed to Airflow, I think we should be safe

@MichaReiser MichaReiser assigned MichaReiser and unassigned ntBre Apr 17, 2026
Copy link
Copy Markdown
Contributor

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good. Thanks!

Comment thread crates/ruff_linter/resources/test/fixtures/airflow/AIR201.py
@Dev-iL Dev-iL force-pushed the airflow/AIR201_mixed_context branch from 944bfe8 to fd7e6fe Compare April 17, 2026 16:01
@MichaReiser
Copy link
Copy Markdown
Member

I'm not sure the extension is in the spirit of the rule. Specifically

The .output attribute provides better IDE support and makes task dependencies more explicit.

I understand that this is only achieved by removing the Jinja string in its entirety, which doesn't seem to be the case for most fixes. To me, this change extends the rule with examples on how to simplify the pull_task expression

@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Apr 21, 2026

I'm not sure the extension is in the spirit of the rule. Specifically

The .output attribute provides better IDE support and makes task dependencies more explicit.

I understand that this is only achieved by removing the Jinja string in its entirety, which doesn't seem to be the case for most fixes. To me, this change extends the rule with examples on how to simplify the pull_task expression

Do you propose to move these to a separate rule?

@MichaReiser
Copy link
Copy Markdown
Member

MichaReiser commented Apr 21, 2026

I think that's something we have to figure out. We either need to change the scope of the existing rule and update its motivation or have a separate rule. To help with this, it's important to know what you want to catch with this rule long-term. Are there cases where users might only want to lint for one and not the other? Are the motivations different? I think that's something that the airflow community must answer first.

@MichaReiser MichaReiser marked this pull request as draft April 21, 2026 13:30
@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Apr 23, 2026

Are there cases where users might only want to lint for one and not the other?

I can't think of a case like that.

Are the motivations different?

Motivations - no; practical benefits - yes: when replacing a string by a string, although we get code that's more concise, we get no benefits to refactoring and IDE support. I think the addition doesn't have a sufficient raison d'être as an individual rule.

I think that's something that the airflow community must answer first.

Agreed. Looking back, the proposal approved by the community explicitly excluded mixed-content strings.

@MichaReiser
Copy link
Copy Markdown
Member

I think this will need input from @Lee-W

@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented Apr 24, 2026

IMO, the main benefit is changing ti.xcom_pull('task_1') to task_1.output, whether or not it's part of Jinja. This ensures new and modern syntax is used.

@MichaReiser
Copy link
Copy Markdown
Member

That would mean we should rephrase the rules motivation and remove the editor part entirely.

@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented Apr 25, 2026

That would mean we should rephrase the rules motivation and remove the editor part entirely.

Agree we should rephrase the motivational rules. The editor's section can still be included but should be emphasized less?

@MichaReiser
Copy link
Copy Markdown
Member

MichaReiser commented Apr 27, 2026

The editor's section can still be included but should be emphasized less?

Possibly. I'm still wondering if two separate rules would be preferred.

@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented May 3, 2026

The editor's section can still be included but should be emphasized less?

Possibly. I'm still wondering if two separate rules would be preferred.

I kinda feel AIR201 should work with both the Jinja thing and mixed-content. And the pure Jinja replacement can be another rule (since we're only handling taskflow now)

@Dev-iL WDYT?

@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented May 3, 2026

I kinda feel AIR201 should work with both the Jinja thing and mixed-content. And the pure Jinja replacement can be another rule (since we're only handling taskflow now)

  1. I don't understand what separation you are proposing. Could you please clarify?
  2. According to TP's comment, it seems like most repalcements performed by this PR will result in broken templates since task names are supposedly not in the jinja context, unlike ti or task_instance, and will therefore fail to resolve. Assuming this is true - what will the jinja to jinja rule do?

(since we're only handling taskflow now)

What makes you say that? Traditional operators are affected by the existing AIR201 too. Do you mean in the sense that the replacement creates relations between tasks making (some) >> redundant?

@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented May 3, 2026

we're not handling all {{ object }} to object cases now. (e.g., {{ inlet_events }} might also work) We're only trying to do {{ ti.xcom_pull(task_ids='task_1') }} to task_1.output.

The first is the jinja removal thing I'm talking about.
The second is changing to .output logic.

(since we're only handling taskflow now)

Wrong wording 🫠 What I meant is the .output syntax. This "feels" like taskflow to me. but yep, it's not. traditional op works fine.


What I'm thinking for AIR201 is the changing to .output logic. case. If we want to do something like {{ inlet_events }} to inlet_events, that's another matter.


According to TP's comment, it seems like most repalcements performed by this PR will result in broken templates since task names are supposedly not in the jinja context, unlike ti or task_instance, and will therefore fail to resolve. Assuming this is true - what will the jinja to jinja rule do?

hmm... ok then i'm wrong about this one... then, i guess we'll need to just keep 201 as it was without this change?

@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented May 3, 2026

we're not handling all {{ object }} to object cases now. (e.g., {{ inlet_events }} might also work) We're only trying to do {{ ti.xcom_pull(task_ids='task_1') }} to task_1.output.

The first is the jinja removal thing I'm talking about. The second is changing to .output logic.

This suggestion made me think: the current AIR201 depends on AIR001, otherwise a replacement might break a Dag. Replacing other templated variables (i.e. non-output) seems even more brittle since those variables are less likely to be in scope, and we have no control over what the user decides to name them. So for a non-output replacement to work we'd have to analyze all variables in scope and suggest a suitable one. This seems impractical to me.

What I'm thinking for AIR201 is the "changing to .output logic" case. If we want to do something like {{ inlet_events }} to inlet_events, that's another matter.

Agreed.

then, i guess we'll need to just keep 201 as it was without this change?

Yeah, without this change. However, I think we should include further syntax modernization going live in Airflow 3.3:

@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented May 4, 2026

This seems impractical to me.

yep, that jinja thing was never something in my mind till this discussion happened

Yeah, without this change. However, I think we should include further syntax modernization going live in Airflow 3.3:

Sounds like a good idea to me. I was suprised it did not work that way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Related to preview mode features rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants