Skip to content

AIP-76: Hold Dag run until all upstream partitions arrive#64571

Open
Lee-W wants to merge 35 commits into
apache:mainfrom
astronomer:asset-partition-window
Open

AIP-76: Hold Dag run until all upstream partitions arrive#64571
Lee-W wants to merge 35 commits into
apache:mainfrom
astronomer:asset-partition-window

Conversation

@Lee-W
Copy link
Copy Markdown
Member

@Lee-W Lee-W commented Apr 1, 2026

Why

Closes: #59294

Why

Asset-partitioned Dags that aggregate many upstream slices into one downstream period (e.g., 60-minute-level events rolling up into one hourly Dag run) had no way to express that requirement — the scheduler would fire the downstream run as soon as any single upstream partition arrived.

This PR implements the rollup building block from AIP-76: a Window type that enumerates the full set of upstream partitions required for a downstream period, a RollupMapper that wires a source mapper to a window, and the scheduler logic to gate Dag runs until every required upstream key is present.

What

Partition mappers/windows

  • Add Window ABC and six concrete implementations (HourWindow, DayWindow, WeekWindow, MonthWindow, QuarterWindow, YearWindow) to both airflow-core and the Task SDK
  • Add RollupMapper that composes a source_mapper with a Window and exposes to_upstream(downstream_key) → frozenset[str]
  • Add decode_downstream / encode_upstream hooks to PartitionMapper and implement them in _BaseTemporalMapper; StartOfWeekMapper gets a regex-based override because %V is ambiguous with strptime.
  • Add week_start parameter to StartOfWeekMapper for non-Monday week starts

Scheduler

  • Rewrite _create_dagruns_for_partitioned_asset_dags to bulk-fetch serialized Dags and partition-key logs, removing N+1 queries, and cap per-tick work at MAX_PARTITION_DAG_RUNS_PER_TICK
  • Add _resolve_asset_partition_status / _check_rollup_asset_status to evaluate rollup satisfaction; non-rollup assets continue to satisfy immediately

Serialization

  • Add encode_window / decode_window and extend mapper encoder/decoder to round-trip RollupMapper and all Window subclasses

UI / API

  • Enrich next_run_assets endpoint with per-asset received_count, required_count, received_keys, required_keys, and is_rollup for partitioned Dags
  • Update AssetNode and AssetSchedule components to surface rollup progress (e.g. "12 / 24 received")
  • Add AssetProgressCell for inline progress in the Dags list

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Sonnet 4.6

Generated-by: Claude Sonnet 4.6 following the guidelines

with DAG(
    dag_id="daily_team_a_rollup",
    schedule=PartitionedAssetTimetable(
        assets=team_a_player_stats,
        default_partition_mapper=RollupMapper(
            source_mapper=StartOfDayMapper(),
            window=DayWindow(),
        ),
    ),
    catchup=False,
    tags=["player-stats", "rollup"],
):
    """
    First rollup level: 24 hourly partitions of ``team_a_player_stats`` → one daily summary.

    ``StartOfDayMapper`` normalizes each upstream hourly timestamp (``%Y-%m-%dT%H:%M:%S``)
    to its day-start (``%Y-%m-%d``); ``DayWindow`` declares the downstream run needs
    all 24 hourly partitions before firing. Publishes ``daily_team_a`` so the
    monthly rollup below can consume it.
    """

    @task(outlets=[daily_team_a])
    def summarise_team_a_day(dag_run=None):
        """Produce the full-day rollup once every hour has arrived."""
        if TYPE_CHECKING:
            assert dag_run
        print(f"All 24 hourly partitions received. Day: {dag_run.partition_key}")

    summarise_team_a_day()
image image
  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg Bot added area:Scheduler including HA (high availability) scheduler area:task-sdk labels Apr 1, 2026
@Lee-W Lee-W changed the title feat(AIP-76): window feat(AIP-76): implement to_upstream Apr 1, 2026
@kaxil kaxil requested a review from Copilot April 2, 2026 00:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements “rollup” support for partition mappers (AIP-76) by introducing a RollupMapper interface with to_upstream() and using it in the scheduler to wait for a complete set of upstream partition keys before creating partitioned asset-triggered DAG runs.

Changes:

  • Add RollupMapper base class (core + task SDK) with an abstract to_upstream() contract.
  • Implement to_upstream() for weekly and monthly temporal mappers (core + task SDK).
  • Update the scheduler’s partitioned-asset DAG-run creation logic to enforce rollup completeness when a mapper supports it.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
task-sdk/src/airflow/sdk/definitions/partition_mappers/base.py Introduces SDK-side RollupMapper abstraction.
task-sdk/src/airflow/sdk/definitions/partition_mappers/temporal.py Adds SDK to_upstream() for week/month temporal rollups.
airflow-core/src/airflow/timetables/base.py Adds get_partition_mapper() hook to the Timetable protocol.
airflow-core/src/airflow/partition_mappers/base.py Introduces core-side RollupMapper abstraction.
airflow-core/src/airflow/partition_mappers/temporal.py Adds core to_upstream() for week/month temporal rollups.
airflow-core/src/airflow/jobs/scheduler_job_runner.py Uses rollup mapper behavior to decide when partitioned asset-triggered DAG runs are ready.

Comment thread airflow-core/src/airflow/jobs/scheduler_job_runner.py Outdated
Comment thread airflow-core/src/airflow/jobs/scheduler_job_runner.py Outdated
Comment thread airflow-core/src/airflow/jobs/scheduler_job_runner.py Outdated
Comment thread airflow-core/src/airflow/jobs/scheduler_job_runner.py Outdated
Comment thread airflow-core/src/airflow/partition_mappers/temporal.py Outdated
Comment thread airflow-core/src/airflow/partition_mappers/temporal.py Outdated
Comment thread airflow-core/src/airflow/partition_mappers/temporal.py Outdated
Comment thread task-sdk/src/airflow/sdk/definitions/partition_mappers/temporal.py Outdated
Comment thread task-sdk/src/airflow/sdk/definitions/partition_mappers/temporal.py Outdated
Comment thread airflow-core/src/airflow/jobs/scheduler_job_runner.py Outdated
@Lee-W Lee-W force-pushed the asset-partition-window branch 2 times, most recently from e72dfa6 to e6d53f2 Compare April 7, 2026 09:57
@Lee-W
Copy link
Copy Markdown
Member Author

Lee-W commented Apr 7, 2026

from __future__ import annotations

from airflow.sdk import (
    DAG,
    Asset,
    CronPartitionTimetable,
    PartitionedAssetTimetable,
    WeeklyRollupMapper,
    task,
)

daily_sales = Asset(uri="file://incoming/sales/daily.csv", name="daily_sales")

# Upstream Dag: produces one partition per day (key format: "2024-01-15T00:00:00")
with DAG(
    dag_id="ingest_daily_sales",
    schedule=CronPartitionTimetable("0 0 * * *", timezone="UTC"),
):

    @task(outlets=[daily_sales])
    def ingest():
        pass

    ingest()


# Downstream Dag: runs once all 7 daily partitions for a week have arrived
with DAG(
    dag_id="weekly_sales_report",
    schedule=PartitionedAssetTimetable(
        assets=daily_sales,
        default_partition_mapper=WeeklyRollupMapper(),
    ),
    catchup=False,
):

    @task
    def generate_report(dag_run=None):
        # dag_run.partition_key will be the week key, e.g. "2024-01-15 (W03)"
        print(dag_run.partition_key)

    generate_report()

@Lee-W Lee-W force-pushed the asset-partition-window branch from 91c4ac3 to 93e82cb Compare April 7, 2026 11:48
@Lee-W
Copy link
Copy Markdown
Member Author

Lee-W commented Apr 7, 2026

The backend part is basically wrapped up, but the frontend and API side need some work. The UI is quite weired for these cases now

@Lee-W Lee-W force-pushed the asset-partition-window branch from 9934a73 to f52823c Compare April 10, 2026 09:09
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 13 comments.

Comments suppressed due to low confidence (1)

airflow-core/src/airflow/api_fastapi/core_api/routes/ui/partitioned_dag_runs.py:1

  • Counting PartitionedAssetKeyLog.id can over-count when duplicate log rows exist for the same upstream partition key (e.g. retries/dup inserts), inflating total_received and potentially showing the run as satisfiable earlier than it should be. Consider counting distinct PartitionedAssetKeyLog.source_partition_key (and/or a distinct composite of (asset_id, source_partition_key)) to match the scheduler’s set-based satisfaction semantics.
# Licensed to the Apache Software Foundation (ASF) under one

Comment thread task-sdk/src/airflow/sdk/definitions/partition_mappers/temporal.py Outdated
Comment thread task-sdk/src/airflow/sdk/definitions/partition_mappers/temporal.py Outdated
Comment thread task-sdk/src/airflow/sdk/definitions/partition_mappers/base.py Outdated
Comment thread airflow-core/src/airflow/partition_mappers/temporal.py
Comment thread airflow-core/src/airflow/partition_mappers/temporal.py
Comment thread airflow-core/src/airflow/ui/src/components/AssetProgressCell.tsx Outdated
Comment thread airflow-core/tests/unit/partition_mappers/test_temporal.py Outdated
Comment thread airflow-core/tests/unit/partition_mappers/test_temporal.py Outdated
Comment thread airflow-core/src/airflow/api_fastapi/core_api/routes/ui/partitioned_dag_runs.py Outdated
Comment thread airflow-core/src/airflow/jobs/scheduler_job_runner.py
@Lee-W Lee-W force-pushed the asset-partition-window branch 9 times, most recently from a64d06a to 9064515 Compare April 17, 2026 12:12
@Lee-W Lee-W marked this pull request as ready for review April 20, 2026 06:32
Lee-W added 30 commits May 23, 2026 13:08
…t ordering

StartOfWeekMapper and StartOfQuarterMapper now derive their decode_downstream
regex from output_format itself, so users can re-order strftime directives
and {name} placeholders (e.g. "Q{quarter}/%Y") without having to override
decode_downstream. Malformed output_format — empty {}, non-identifier
placeholder names, duplicate %X directives, duplicate {name} placeholders —
raises ValueError at mapper construction instead of an opaque re.error from
deep inside a scheduler tick or UI route.
…ag_runs list

Drop the SQL "count distinct assets with any log" subquery and always
compute total_received via the Python rollup-aware helper. The list
endpoint previously returned different numbers for the same APDR
depending on whether the caller filtered by dag_id (rollup-aware,
counts upstream window keys) or queried globally (SQL approximation,
counts assets with any log) — same field, different semantics, very
confusing for any UI consumer.

The N+1 cost of per-Dag timetable loads was already paid in the
global branch for total_required, so adding a single batched log
fetch keeps the existing query budget while making the contract
identical across both views. _compute_received_count now skips
asset_ids that are no longer required (active=False) so the relaxed
log query doesn't over-count.
StartOfWeekMapper now always uses ISO weeks (Monday) and
StartOfMonthMapper always emits the 1st of the month. Custom
fiscal boundaries can still be expressed by pairing a user-defined
source mapper with the existing windows.
The next_run_assets and partitioned_dag_runs endpoints used to load
and deserialize the full timetable on every request just to read
mapper attributes (is_rollup) and required-key counts. Cache mapper
metadata per asset on DagModel during Dag sync via a new
``partition_mapper_info`` JSON column, so the UI resolves mapper
attributes from the cache and only loads the timetable when
``to_upstream`` evaluation for rollup mappers is actually needed.
``partition_mapper_info`` now iterates every asset in ``asset_condition``
and uses ``get_partition_mapper``, so a Dag configured with
``default_partition_mapper=RollupMapper(...)`` (the primary documented
pattern) is correctly reported as rollup. Previously the list was built
from ``partition_mapper_config`` only, leaving ``has_rollup_mappers``
False and silently disabling rollup UI behaviour.

Also: extract the shared ``load_partitioned_timetable`` helper and log
on deserialization failure; coerce NULL ``source_partition_key`` to
``""`` in the scheduler to match the UI normalisation.
Old serialized rows or hand-crafted partial dicts caused a KeyError on
DagModel.is_rollup_asset and has_rollup_mappers. Switch to .get() with
a False default so the read side is resilient to schema evolution.
Add docstrings explaining accepted strftime directives, round-trip
requirements, and why regex compilation happens eagerly at construction
time rather than lazily inside the scheduler loop.
Covers the previously untested MonthWindow case in
test_window_serialize_round_trip. Uses input_format="%Y-%m-%d" instead
of "%Y-%m" to prevent 29 day keys from collapsing to the same value and
masking decode failures.
DayWindow always generates 24 naive hourly steps. When paired with a
local-timezone source mapper, spring-forward gaps make one expected
upstream key unattainable so the rollup can never complete; fall-back
causes the extra hour to be excluded from the expected set.

Add a warning block to DayWindow's docstring, two tests (one pinning
the naive-24 invariant, one xfail documenting the spring-forward
under-yield), and a Known Limitations section to the AIP-76 newsfragment.
Clarify that inactive assets are filtered from the UI progress query
but their PartitionedAssetKeyLog rows are preserved, so re-activating
an asset automatically resumes rollup accumulation without data loss.
… time

Add PartitionMapper.__init_subclass__ that raises TypeError when a
subclass overrides exactly one side of the decode/encode pair. An
unpaired override silently breaks RollupMapper.to_upstream by producing
non-str members, causing the scheduler's upstream-window check to never
satisfy and leaving the Dag run held forever with no diagnostic.

MRO-based comparison (cls.method is not PartitionMapper.method) is used
rather than __dict__ lookup so intermediate base classes such as
_BaseTemporalMapper are handled correctly.
A bad partition mapper previously wrote a new Log row on every scheduler
tick (once per second), flooding the audit log. Add a process-level
_partition_audit_seen set to SchedulerJobRunner that deduplicates by
(dag_id, asset_name, asset_uri): after the first entry the scheduler
still logs the exception at ERROR level each tick (useful for ops) but
stops inserting into the Log table. The set resets on restart, so one
fresh entry is written after a config fix and re-deploy.

Also add three scheduler-side evidence tests:
- audit log deduplication across two consecutive ticks
- rollup survives a simulated scheduler restart with partial key arrival
- duplicate PAKL rows do not prevent rollup completion (set semantics)
Replace the hardcoded MAX_PARTITION_DAG_RUNS_PER_TICK=500 with a new
[scheduler] max_partition_dag_runs_to_create_per_loop config option
(default 500). The value is read once in SchedulerJobRunner.__init__
alongside the other self._* conf reads, per the invariant that all
conf access stays out of the scheduler loop.
MonthWindow / QuarterWindow / YearWindow expand by month-step arithmetic
(replace(month=...), shift_months), which is only safe when the period
starts on day 1 of the month. Built-in temporal upstream mappers
normalise to day 1, but a custom PartitionMapper.decode_downstream
returning e.g. Jan 31 would crash the scheduler tick with a confusing
'day is out of range for month' ValueError.

Raise an explicit ValueError up-front so the upstream-mapper contract
violation is visible.
decode_window currently calls import_string on the serialized Encoding.TYPE
directly with no gate. A tampered serialized Dag could name any importable
Python class and have the scheduler import it during deserialization. This
mirrors the existing partition_mapper / timetable serialization patterns,
which both gate on a core-import-path check before falling back to plugin
registries.

Add is_core_window_import_path + WindowNotSupported, and require
encode_window / decode_window to only accept the built-in Window subclasses
shipped under airflow.partition_mappers.window. Custom Window subclasses
are not currently supported; if a real use case appears later, mirror the
partition_mapper plugin registry rather than relaxing this gate.
The scheduler picks the oldest pending AssetPartitionDagRun first
(_create_dagruns_for_partitioned_asset_dags is strict FIFO on
created_at). The next_run_assets UI route was sorting desc and
returning the newest pending APDR, so when more than one partition
was backlogged the UI showed 'X/Y received' for partition_key=newest
while the scheduler was about to fire partition_key=oldest.

Flip the ordering so the UI surfaces the same APDR the scheduler will
fire next.
source_partition_key, target_dag_id, target_partition_key on
PartitionedAssetKeyLog and target_dag_id, partition_key on
AssetPartitionDagRun were typed Mapped[str | None] while the column
itself is nullable=False. Worse, scattered `or ""` defensives at every
read site coerced any NULL slip-in (via raw SQL or future migration)
to the empty string — silently collapsing all keys, falsely satisfying
rollups, and making an empty-string partition_key indistinguishable
from a buggy NULL.

Match the type hint to the schema (Mapped[str]) and drop the `or ""`
coalesce so a NULL becomes a real TypeError instead of a stealthy bug.
TYPE_CHECKING is_not_none asserts also go away.

partition_mapper_info entries declare `is_rollup: bool` as required in
the PartitionMapperInfo TypedDict, so .get("is_rollup", False) just
masks the contract. Switch to entry["is_rollup"] and let the KeyError
surface if a future migration adds a new shape without backfilling.
The misconfigured-rollup audit Log row was previously added to the outer
session inside _resolve_asset_partition_status. Its caller
_create_dagruns_for_partitioned_asset_dags runs inside
_create_dagruns_for_dags, which is wrapped in @retry_db_transaction —
so a downstream OperationalError, transaction retry, or scheduler crash
mid-tick would roll the Log row back while the in-memory
_partition_audit_seen set still said 'already logged'. The operator
never saw a single audit entry for the misconfig.

Write the Log row on create_session(scoped=False) so it commits on its
own connection, independent of the outer transaction. Update the dedup
set only after the independent commit succeeds; a failure to write the
audit row is logged and swallowed (the warning at the top of the
except: branch already captures the same information for operators
reading scheduler logs).
Three related UI-side problems:

1. with suppress(Exception) around the rollup mapper call in
   the next_run_assets and partitioned_dag_runs routes silently
   downgraded to non-rollup counts with no log line — users saw a
   stuck 0/1 received forever with no signal that the mapper was
   broken. Replace with try/except + log.warning(..., exc_info=True)
   so operators get a per-request log entry.

2. load_partitioned_timetable was called inside a dict
   comprehension over unique_dag_ids, hitting SerializedDagModel.get
   once per Dag — an N+1 in the list route. Add a batch
   load_partitioned_timetables that uses
   SerializedDagModel.get_latest_serialized_dags(dag_ids=...) and
   call it for the rollup-only subset.

3. except Exception around timetable deserialization swallowed any
   error, including unrelated bugs. Narrow to (KeyError, ValueError,
   ImportError, AttributeError) — the classes serdag.dag.timetable
   can actually raise — so config errors and programming bugs surface
   instead of being absorbed into a silent fallback.
The partition-dagrun creator was reading every PartitionedAssetKeyLog row
regardless of whether the source asset was still active. An asset that
became orphaned (no declaring Dag) would still let pending APDRs fire on
its stale history, even though the UI's progress view already excluded it.

The PAKL query now filters ``AssetModel.active.has()``, matching the UI's
``_fetch_active_assets_per_dag`` contract. Reactivating an asset resumes
APDR evaluation automatically.
Three UI routes (`next_run_assets`, partitioned-Dag-run list, detail) now report 0 received with no claimed keys when the rollup mapper raises, matching the scheduler's not-yet-satisfied verdict in `_resolve_asset_partition_status`. Previously they fell back to 1/1 ready while the scheduler held the run.
NextRunAssetEventResponse and PartitionedDagRunAssetResponse now carry mapper_error: bool (default False), set True when the rollup mapper raised. UIs can present a distinct misconfigured/blocked state instead of leaving the asset as a silent not-yet-satisfied wait.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler area:task-sdk

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Implement rollup (many-to-one partition mapper)

4 participants