Skip to content

feat(rust): Vortex Phase 2 — AExpr-direct filter pushdown (stacked on #2)#3

Closed
lwwmanning wants to merge 64 commits into
vortex-integration-phase-1from
vortex-integration
Closed

feat(rust): Vortex Phase 2 — AExpr-direct filter pushdown (stacked on #2)#3
lwwmanning wants to merge 64 commits into
vortex-integration-phase-1from
vortex-integration

Conversation

@lwwmanning
Copy link
Copy Markdown
Member

Replaces the SpecializedColumnPredicate-based filter pushdown shipped in #2 with an AExpr-direct convertor that walks Polars' Arena<AExpr> at IR-build time in lower_ir.rs::FileScanIR::Vortex, where the arena is still in scope. The convertor handles a strictly broader shape set: scalar comparisons, boolean combinators, null checks, struct field access, numeric col + 1, same-kind CAST under Strict options, is_between / is_in / str.starts_with / str.ends_with / str.contains(literal=True) / Ternary, and arbitrary multi-column predicates composed from the above. Schema gates on every direct-builder arm protect the always-SAFE-fallback contract — Vortex's checked_add, cast, comparison, and struct-field builders each error at scan-time on certain inputs (cross-kind CAST, cross-PType arithmetic, missing struct fields), and the gates refuse pushdown so the residual filter handles those cases. When pushdown is refused for any shape, the multi-scan layer reapplies the original predicate post-decode via PARTIAL_FILTER — correctness is invariant, the visible effect is wall-clock.

Stacked on #2; review that PR first. The vortex_scan/filter_arithmetic Criterion bench in #2 is the load-bearing measurement: col + 1 == N falls back to "decode everything then filter" on #2 alone and pushes down to Vortex's zone pruner here. Capture a phase-1 baseline on #2's tip with cargo bench -p polars --features vortex,cloud,parquet,dtype-full,strings --bench io_vortex -- --save-baseline phase-1, then re-run with --baseline phase-1 on this branch to see the delta.

🤖 Generated with Claude Code

lwwmanning and others added 30 commits May 19, 2026 11:40
…dation (PR-13.1)

Adds `polars_plan::plans::aexpr::predicates::vortex_convertor` — the kernel
of PR-13's aggressive AExpr pushdown. Covers the 14 foundational shapes:
- Column reference + scalar literal leaves
- 6 comparison ops (Eq/NotEq/Lt/LtEq/Gt/GtEq) on BinaryExpr
- 2 boolean combinators (And/Or, plus their LogicalAnd/LogicalOr aliases)
- 3 unary boolean functions (IsNull/IsNotNull/Not via IRBooleanFunction)
Returns `Some(Expression)` on full pushdown; `None` if ANY sub-shape is
unsupported (safe fallback — multi-scan layer re-applies the full
predicate post-decode, so dropping coverage is SOUND, just suboptimal).

21 unit tests verify each shape converts + that unsupported shapes
(arithmetic, Xor, Cast, Dyn literals) return None + integration tests for
nested predicates.

## Why polars-plan, not polars-vortex (correcting the plan PR-2.1 row)

The plan PR-2.1 row originally listed
`crates/polars-vortex/src/read/aexpr_predicate.rs` as the target file. But
`polars-vortex` cannot depend on `polars-plan`: polars-plan's Cargo.toml
declares an optional `polars-vortex = { workspace = true, optional = true }`
dep gated on the `vortex` feature (lines 28 + 80), so the dep arrow points
polars-plan → polars-vortex. The convertor consumes `AExpr`, `LiteralValue`,
`Operator`, and `IRBooleanFunction` — all polars-plan types — so it has
to live in polars-plan. Discovered during PR-2.1 implementation when the
initial polars-vortex draft failed `cargo check` with E0433 on
`use polars_plan::...`.

The convertor's `convert_literal` reuses
`polars_vortex::read::predicate::polars_scalar_to_vortex` (now `pub`,
was `pub(super)`) so the `AnyValue` → `VortexScalar` mapping has one
canonical source of truth across both the SpecializedColumnPredicate
path (this module's predecessor, to be deleted in PR-2.6) and the new
AExpr-direct path.

Plus: `polars-vortex/src/lib.rs` narrowed re-export widened from
`{array, error, file, io, layout}` to `{array, error, expr, file, io,
layout}` so polars-plan can reach `vortex::expr::{eq, lt, ...}` through
the BAN-checked re-export.

## Files

- `crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs` (new,
  ~400 LoC including the test suite)
- `crates/polars-plan/src/plans/aexpr/predicates/mod.rs` (registered the
  new module behind `#[cfg(feature = "vortex")]`)
- `crates/polars-vortex/src/lib.rs` (added `expr` to narrowed re-export)
- `crates/polars-vortex/src/read/predicate.rs` (made
  `polars_scalar_to_vortex` `pub`)
- `crates/polars-vortex/src/read/aexpr_predicate.rs` (stub left in place
  to document the architectural correction — file is not registered in
  `read/mod.rs` so it's not compiled; if reviewers flag as clutter it can
  be removed in a follow-up)

## Wiring

No call site yet (PR-2.1 acceptance: "Module compiles; unit tests for each
shape pass with arena-constructed inputs; no wire-up yet"). PR-2.2 will
wire the convertor at `polars-stream/src/physical_plan/to_graph.rs:843`
inside the `FileScanIR::Vortex` branch of `lower_node`.

## Verified

- `cargo check -p polars-plan --features vortex` clean
- `cargo test -p polars-plan --features vortex,is_between,is_in vortex_convertor`
  → 21 passed, 0 failed
…O + EqValidity None tests

Cycle-1 gauntlet (preset=pr-2, fresh+correctness) accepted PR-2.1 with
no must-fix but flagged 5 should-fix items. This commit addresses 2 of
them; the others (3 deferred) are tracked in the Deferred work section
in the follow-up plan commit.

1. **Bitwise-vs-logical operator caveat** (cycle-1 correctness should-fix
   #1+#2; cycle-1 fresh should-fix F-003): Polars `Operator::And`/`Or`
   and `IRBooleanFunction::Not` are bitwise-OR-logical (work on integer
   columns too — see boolean.rs:49 `// Also bitwise negate`). Vortex's
   `and`/`or`/`not` are boolean-only. For embedded integer-bitwise
   sub-trees like `(col_int & 1) > 0`, the convertor would emit
   semantically wrong Vortex expressions. Polars's own column_expr.rs:
   245-247 guards with `dtype.is_bool()`.

   Added a load-bearing TODO comment at the doc-block of
   `aexpr_to_vortex_expression` explaining the constraint + naming
   PR-2.2's wire-up site as the schema-gated mitigation point. Option (a)
   at the call site (skip pushdown if any And/Or/Not operand is non-bool)
   is cheaper than threading `&Schema` through the convertor.

2. **Missing None-returning tests for `EqValidity`/`NotEqValidity`**
   (cycle-1 fresh should-fix F-002): the explicit `return None` arms at
   the match for these operators had no test coverage. Added two ~10-LoC
   tests parallel to `unsupported_xor_returns_none`. Test count 21 → 23.

Verified: `cargo test -p polars-plan --features vortex,is_between,is_in
vortex_convertor` → 23 passed, 0 failed.
…deferred

PR-2.1 cycle-1 gauntlet preset=pr-2 (fresh+correctness) accepted. Both
reviewers high confidence; 0 must-fix, 5 should-fix, 1 nit. 2 should-fix
addressed inline in commit 06f4f85 (bitwise-vs-logical TODO + EqValidity
tests). 3 should-fix deferred via a bundled Deferred-work entry.

<details><summary>gauntlet Synthesizer Output (schema_version: 1) — PR-2.1 cycle 1</summary>

```json
{
  "schema_version": 1,
  "preset": "pr-2",
  "lenses_used": ["fresh", "correctness"],
  "review_count": 2,
  "unified_findings": [
    {
      "severity": "should-fix",
      "kind": "bug",
      "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:104,110-111,142",
      "found_by": ["correctness", "fresh"],
      "description": "Operator::And/Or + IRBooleanFunction::Not are bitwise-or-logical in Polars (boolean.rs:49 'Also bitwise negate'); Vortex and/or/not are boolean-only. For embedded integer-bitwise subtrees the convertor emits semantically-wrong Vortex expressions. Polars's column_expr.rs:245-247 guards with dtype.is_bool().",
      "recommended_fix": "Added load-bearing TODO comment naming PR-2.2 wire-up site as schema-gated mitigation point. Option (a) at call site (skip pushdown when any operand non-bool) is cheaper than threading &Schema."
    },
    {
      "severity": "should-fix",
      "kind": "coverage",
      "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:tests",
      "found_by": ["fresh"],
      "description": "Missing None-returning tests for Operator::EqValidity/NotEqValidity arms. Added 2 tests parallel to unsupported_xor_returns_none."
    },
    {
      "severity": "should-fix",
      "kind": "test-quality",
      "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:per-shape tests",
      "found_by": ["fresh", "correctness"],
      "description": "21+2 tests assert .is_some() only; don't inspect Expression structure. Paste-swap bug between Eq/NotEq or Lt/Gt or And/Or or IsNull/IsNotNull would not be caught. Same pattern flagged in PR-2.0 cycle-2 on C-003 tests. Deferred to Phase 2.x housekeeping or PR-2.6 cleanup."
    },
    {
      "severity": "should-fix",
      "kind": "perf",
      "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:104-128",
      "found_by": ["fresh"],
      "description": "AExpr::Function { Boolean(_) } arm recursively converts arg_node before inner match, wasting work for unsupported variants (16/19 IRBooleanFunction variants). Negligible perf cost; structural nit. Deferred to PR-2.2-.5 alongside shape additions."
    },
    {
      "severity": "should-fix",
      "kind": "leak",
      "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:aexpr_to_vortex_expression",
      "found_by": ["correctness"],
      "description": "Recursive walk; deeply nested predicates consume one stack frame per AExpr node. Vortex's and_collect/or_collect use balanced trees. Deferred to Phase 4 polish unless benchmarks show issues."
    },
    {
      "severity": "nit",
      "kind": "doc-quality",
      "file_line": "crates/polars-vortex/src/read/aexpr_predicate.rs",
      "found_by": ["correctness"],
      "description": "Orphan stub file at polars-vortex documenting the architectural relocation. Not registered in mod.rs; harmless but cluttery. Defer to housekeeping."
    }
  ],
  "disagreements": [],
  "dropped_re_flags": [],
  "executive_summary": "PR-2.1 cycle 1 accepted by both lenses (fresh + correctness) at high confidence. Convertor at polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs (~450 LoC including 23 tests post-cycle-1-fix) implements the 14 PR-13 foundational shapes correctly: Column, Literal::Scalar, 6 comparisons, And/Or + LogicalAnd/LogicalOr aliases, IsNull/IsNotNull/Not. Exhaustive match on Polars Operator (18 variants, 8 mapped, 10 explicit None). No must-fix items. 5 should-fix items identified; 2 addressed in 06f4f85 (bitwise-vs-logical TODO at function doc + 2 None-returning tests for EqValidity/NotEqValidity). 3 should-fix items deferred via a bundled Deferred work entry (tautological tests, eager arg conversion, stack-overflow risk on pathological inputs). Plan correction noted: convertor moved from polars-vortex to polars-plan because the dep direction is polars-plan → polars-vortex (commit 4f8750a amends PR-2.1 row). Acceptance criteria all met: module compiles; 23 unit tests pass (14 shape coverage + 4 unsupported-shape None coverage + 2 integration); no wire-up yet (PR-2.2 wires at to_graph.rs:843 with schema gate for bitwise-vs-logical).",
  "overall": "accept",
  "must_fix_count": 0,
  "should_fix_count": 5,
  "nit_count": 1,
  "review_cycles_this_invocation": 1
}
```

</details>
…ema gate

Extends aexpr_to_vortex_expression to cover the first arithmetic pushdown
(Operator::Plus → vortex::expr::checked_add) and adds the bitwise-vs-logical
schema gate that the PR-2.1 cycle-1 should-fix carved out:

- New parameter `schema: Option<&Schema>` threaded down the recursion.
- For Operator::And / Operator::Or only (NOT LogicalAnd/LogicalOr — those are
  by construction boolean-typed in the IR), check operand dtypes against the
  schema. If either operand is not Boolean, refuse pushdown.
- When schema is None (caller hasn't typecheck'd yet), conservatively refuse
  And/Or pushdown rather than dispatch Vortex boolean-only `and` on an int.

operand_is_bool helper recursively handles And/Or operands (compound
boolean expressions are still boolean-typed at the root). All 31 existing
convertor tests pass after updating callers to pass Some(&schema) / None
explicitly; 4 new tests cover the schema-gate behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or + e2e arithmetic test

Plumbs the AExpr-direct convertor end-to-end so arithmetic-in-predicate
shapes that the legacy `SpecializedColumnPredicate` fast path cannot
represent now push down to the Vortex scan layer.

Wire-up (Option B parallel-path strategy per the plan):

- `VortexReaderBuilder` and `VortexFileReader` each gain a new
  `aexpr_filter: Option<VortexExpression>` field, threaded by
  `build_file_reader` exactly mirroring the existing `segment_cache`
  thread-through pattern from PR-2.0.
- `lower_ir::FileScanIR::Vortex` calls `aexpr_to_vortex_expression`
  while `expr_arena` is still in scope — by the time
  `VortexFileReader::begin_read` runs, only `Arc<dyn PhysicalIoExpr>`
  survives, so this is the right (and only) anchor point.
- `begin_read` prefers the AExpr-direct result over
  `polars_to_vortex_predicate`, falling back to the legacy fast path
  when the convertor returns `None` (unhandled shape). PR-13.6 (PR-2.6)
  will delete the fallback once the convertor is a strict superset.

The convertor module is referenced via the public
`polars_plan::plans::predicates::vortex_convertor::*` re-export
(`pub use aexpr::*` in `plans/mod.rs`) — the `aexpr` module itself
remains `pub(crate)`.

Test:

- `test_scan_with_arithmetic_filter` in
  `py-polars/tests/unit/io/test_vortex.py` asserts `pl.col("a") + 1 == 5`
  on a 20-row file returns exactly the one matching row. Polars reapplies
  the predicate post-decode regardless (PARTIAL_FILTER capability), so
  any drop-rows bug would surface as a wrong shape — a pushdown-not-
  applied regression would manifest as slower but still-correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ Plus dtype gate + structural test

Addresses the two must-fix items from the cycle-1 gauntlet (fresh + correctness
lenses, both reject high-confidence) plus several cleaner cycle-1 should-fix items.

**M1 — Hive-column refuse (lower_ir.rs)**

`file_info.schema` "Always includes all hive columns" (plans/schema.rs:43). The
wire-up was unconditionally passing it to the convertor, which would happily
emit `get_item(hive_col, root())` references to columns that do NOT exist in the
Vortex file's data. Reachable: `scan_vortex` accepts `hive_partitioning=True`.
The legacy fast path doesn't have this bug because it consumes the
already-hive-stripped `ScanIOPredicate::column_predicates` (built via
`create_scan_predicate`). Conservative fix: refuse the convertor entirely when
`hive_parts.is_some()`. Future PR can do a per-column hive-vs-file split
mirroring `create_scan_predicate`'s `hive_predicate` extraction.

**M2 — Plus dtype gate (vortex_convertor.rs)**

Vortex's `checked_add` is fallible on overflow
(`vortex-array/src/expr/analysis/fallible.rs:36
checked_add_defaults_to_fallible`) AND `Binary::coerce_args` vortex_bail!s on
non-primitive operands. Polars `Operator::Plus` is much broader than that:
String (concat), Bool, Date+Duration, etc. all map to it. Unconditionally
emitting `checked_add(lhs, rhs)` would either silently surface Vortex-side
errors at scan time (overflow) or hard-bail at scan time (non-numeric) —
violating the convertor's "always SAFE — None is the safe fallback" contract.

Added `operand_is_numeric` (parallel to `operand_is_bool`) that returns true
for Int*/UInt*/Float* dtypes only. The Plus arm now requires both operands to
be numeric AND a schema to be supplied — otherwise None.

Decimal is intentionally excluded from `is_vortex_numeric_dtype` despite being
primitive in Vortex — polars-vortex hasn't validated `checked_add` interactions
with Decimal scale/precision; deferring to a future PR.

Documented the integer-overflow semantic divergence (Vortex errors vs Polars
wraps) in the function doc-comment. Operands near MAX values will see scan-time
errors instead of wrapped results; tracked as a known limitation pending
upstream Vortex's wrapping_add public API.

**Cycle-1 should-fix items addressed inline**

- F-SF-002/003/006/C-006: module-level `## Wiring` no longer says "No call site
  yet"; updated to describe the actual wire-up at lower_ir.rs:780-791. The
  table reflects Plus shipping (15 shapes). Stale TODO at function doc
  replaced with present-tense documentation of the implemented gate.
- F-SF-005: corrected the misleading boolean.rs:49 citation for And/Or (that
  comment is about IRBooleanFunction::Not; And/Or actually go through
  aexpr/schema.rs:127-149 / get_arithmetic_field).
- F-SF-008/C-008: moved `use vortex_convertor` to the top of lower_ir.rs under
  `#[cfg(feature = "vortex")]` instead of an in-arm `use`.
- C-005: refined `operand_is_bool`'s wildcard `IRFunctionExpr::Boolean(_) =>
  true` to enumerate exactly the boolean-output variants (IsNull, IsNotNull,
  Not-with-recursion). The wildcard was over-permissive but sound by accident;
  the precise enumeration is the right contract.
- C-007: collapsed the `if let Some(s) = schema { ... } if schema.is_none() {
  return None; }` redundant pair into a single `let Some(s) = schema else {
  return None }` (matches the new Plus arm's style for consistency).

**Cycle-1 must-fix addressed via structural test (F-SF-001/C-002 escalated)**

Added `shape_plus_arithmetic_structural` which asserts the produced Vortex
`Expression`'s SQL-form Display contains the expected `checked_add` / column
ref / literal values. Addresses the cycle-1 escalation that the Python e2e test
doesn't verify pushdown engagement: a paste-swap bug
(`Operator::Plus => checked_mul(...)`) would now be caught by this unit test.
The cycle-1 should-fix item (i) — tautological tests carried forward from
PR-2.1 — is now partially addressed for the load-bearing Plus shape. Other
shapes still use `.is_some()`-only assertions, tracked as deferred.

**Test count**: 31 → 36 unit tests (+5 schema-gate tests for Plus,
+1 structural-assertion test). All pass with
`cargo test -p polars-plan --features vortex,is_between,is_in vortex_convertor`.

**Verification**: `cargo check -p polars-stream --features vortex,cloud`
clean; `cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…weep

Cycle-2 gauntlet (2-vote, fresh+correctness, both accept high-confidence) found
zero must-fix items and 7 should-fix observations. Applied inline:

**Cycle-2 C2-001 (correctness lens, should-fix → must-fix-y under fix-attention
H4 self-reinforcement)**: the cycle-1 hive guard had a sibling bug class — the
same "convertor emits get_item to a column the Vortex file doesn't contain"
failure mode applies to row_index and include_file_paths (also virtual columns
not in `file_info.schema` per plans/schema.rs:42). The reviewer traced a
user-reachable repro via `pl.scan_vortex(path, row_index_name="ri").filter(...)`.
Extended the lower_ir guard to also short-circuit when
`unified_scan_args.row_index.is_some()` or `include_file_paths.is_some()`,
falling back to the legacy `polars_to_vortex_predicate` (which is
virtual-column-safe via `create_scan_predicate`'s hive_predicate split).

**Cycle-2 F-SF-CYCLE2-003 + C2-002 (Int128 asymmetry)**: `is_vortex_numeric_dtype`
included `Int128` but excluded `UInt128`/`Float16`. None are Vortex primitives
(PType ceiling is I64/U64/F64) AND `polars_scalar_to_vortex` has no Int128 /
UInt128 / Float16 arms — so the gate would pass Int128 column refs through but
the literal-conversion side would refuse them anyway, producing the right
"refuse" outcome via `?`-propagation. The asymmetry was misleading. Removed
Int128 from the list and added a comment explaining the mirror with
`polars_scalar_to_vortex`'s supported arms.

**Cycle-2 F-SF-CYCLE2-001/002/005 + F-SF-CYCLE2-007 (doc-quality sweep)**:
- Test-module doc-block updated to reflect PR-2.2 scope (was "14 PR-2.1 shapes",
  now explicitly notes the structural-assertion exception for the Plus shape).
- "Shape coverage tests (14 shapes)" comment now reads "15 shapes: 14 foundation
  + Plus" to match the actual coverage.
- Module-level prose at line 26 reconciled with the table row count: "The 14
  shapes below — (13 from PR-2.1's foundation + 1 from PR-2.2: addition (numeric)".
  Aliases like And|LogicalAnd count as one shape row.
- Arithmetic-caveat doc-comment expanded refused-types list to include
  Datetime/Time/Duration (alongside the already-listed Date).

**Cycle-2 F-SF-CYCLE2-004 (test for nested Plus + non-Plus operand)**: added two
new unit tests:
- `shape_plus_nested_numeric_passes` — `(a + b) + c` on Int32 schema; verifies
  `operand_is_numeric` recurses correctly through Plus.
- `shape_plus_with_multiply_operand_returns_none` — `(a * b) + c` correctly
  refuses (the gate's recursion only matches Plus, so non-Plus operand types
  fall through to the wildcard `_ => false`).

**Cycle-2 F-SF-CYCLE2-006 + C2-003 (hive-refuse path needs Python e2e)**: added
`test_scan_with_hive_partitioning_and_filter` to py-polars tests. Constructs
a hive-partitioned directory `year=2024/year=2025`, runs a filtered scan, and
asserts correct results. A regression where the guard is dropped would surface
as a `ComputeError` from Vortex bailing on a missing column 'year'. Also added
`test_scan_with_row_index_and_filter` to lock the new row_index branch of the
guard (the cycle-2 C2-001 fix above).

**Test count**: 36 → 38 convertor unit tests (+2 nested Plus + Multiply
operand tests). Python e2e tests: +2 (hive_partitioning, row_index).

**Verification**: `cargo test -p polars-plan --features vortex,is_between,is_in
vortex_convertor` → 38 passed; `cargo check -p polars-stream --features
vortex,cloud` clean; `cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…redicates

Adds the CAST arm to the AExpr-direct convertor so predicates like
`col.cast(Int64) > 100` push down to Vortex's scan layer.

**Convertor extension** (vortex_convertor.rs):

- New `AExpr::Cast { expr, dtype, .. }` match arm: recursively converts the
  inner expression, materializes the target Vortex `DType` via
  `polars_dtype_to_vortex_dtype`, and wraps in `vortex::expr::cast`.
- New `polars_dtype_to_vortex_dtype` helper: maps the in-scope Polars
  `DataType` set to Vortex `DType`. Covers `Boolean`, `Int8/16/32/64`,
  `UInt8/16/32/64`, `Float32/64`, `String` → `DType::{Bool, Primitive(...),
  Utf8}`. Nullability defaults to `Nullable` (Polars's runtime allows null
  in any column unless statically proven otherwise; CAST to a nullable type
  is always safe per Vortex's nullability-coercion rules).
- Decimal, Object, Categorical/Enum, temporal Extension types, and
  collection types fall through to `None` → residual. Decimal is
  deliberately deferred (Vortex `DecimalDType` scale/precision interactions
  with `vortex::expr::cast` aren't validated at the polars-vortex layer;
  also the project BAN against `as` casts on Vortex Decimal precision/scale).

**polars-vortex re-export widening** (lib.rs):

- Added `dtype` to the narrowed `vortex::` re-export
  (`{array, dtype, error, expr, file, io, layout}`). Per the established
  invariant: re-run the actual-use audit before adding new sub-modules.
  `dtype` is needed so polars-plan's convertor can build `DType::Primitive`
  / `DType::Bool` / `DType::Utf8` directly without going through the
  polars-arrow → upstream-arrow → `FromArrowType<Schema>` bridge (overkill
  for a single dtype).

**Tests added** (vortex_convertor.rs):

- `shape_cast_to_int64_ships_in_pr_2_3` (re-purposed `unsupported_cast_returns_none`)
- `shape_cast_to_float64`, `shape_cast_to_bool`, `shape_cast_to_string` — each
  covers a distinct DType variant
- `shape_cast_to_decimal_returns_none` (gated on `dtype-decimal`) — verifies
  the deferred-decimal refuse path
- `shape_cast_then_compare` — `col.cast(Int64) > 100` matches the plan's
  PR-13.3 acceptance test shape

**Python e2e test** (test_vortex.py):

- `test_scan_with_cast_filter` — `pl.col("a").cast(pl.Int64) > 100` against
  a 4-row Int32 column; asserts the matching 2 rows come back.

**Test count**: 38 → 42 (5 new CAST tests; the 5th `shape_cast_to_decimal`
is feature-gated and runs only with `dtype-decimal`). With `dtype-decimal`:
43 tests. Python e2e: +1.

**Verification**: `cargo check -p polars-stream --features vortex,cloud`
clean; `cargo test -p polars-plan --features vortex,is_between,is_in
vortex_convertor` → 42 passed; with `dtype-decimal`: 43 passed;
`cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d gate + Strict-only + doc sweep

Both cycle-1 gauntlet reviewers (fresh + correctness) rejected with high
confidence, surfacing two soundness bugs and the same H1 doc-drift pattern
PR-2.2 cycle-2 already disciplined against.

**M1 — Source-dtype-kind gate (F-CAST-001 / FRESH-001)**

Vortex's per-array `CastKernel` impls are strictly within-kind: `Primitive::cast`
returns `Ok(None)` for non-Primitive targets (vortex-array/src/arrays/primitive/
compute/cast.rs:62-64); `Bool::cast` for non-Bool (.../bool/compute/cast.rs:41-43);
`VarBinView::cast` for non-Utf8/Binary (.../varbinview/compute/cast.rs:60-62).
When the kernel returns None, `cast/mod.rs:120` `vortex_bail!`s with "No
CastKernel" at scan-time — propagating as a hard `ComputeError`. The convertor
was emitting `cast(child, target)` for cross-kind pairs (Primitive→Bool,
Primitive→Utf8, Bool→Primitive, Utf8→Primitive, etc.) without checking, so
user predicates like `pl.col(int_col).cast(pl.String) == "101"` were crashing
the scan instead of falling back to the residual path.

**Same bug class as PR-2.2 cycle-1 must-fix M2** (Plus on non-numeric
operands → Vortex `Binary::coerce_args` bail). PR-2.2 fixed via the
`operand_is_numeric` schema gate. PR-2.3 mirrors that pattern with
`cast_kind_compatible(source, target)`.

Implementation:
- New `resolve_inner_dtype(node, arena, schema) -> Option<DataType>` —
  resolves the inner expression's output dtype for the gate. Handles Column,
  Literal::Scalar, recursive Cast, comparisons (Boolean output), and Boolean
  function variants (IsNull/IsNotNull/Not → Boolean).
- New `cast_kind_compatible(source, target) -> bool` — verifies the pair is
  in the same Vortex kind (Primitive↔Primitive via is_vortex_numeric_dtype,
  Bool↔Bool, Utf8↔Utf8). Cross-kind → false → convertor refuses → residual.
- CAST arm now requires both schema AND a same-kind source/target pair.

**M2 — `CastOptions::Strict` only (F-CAST-002)**

Polars `CastOptions::{NonStrict, Overflowing}` (overflow→null, wrap) silently
diverge from Vortex's fail-on-overflow `Primitive::CastKernel` (which
`vortex_bail!`s on out-of-range values per cast.rs:85-91). Pushing those
down would convert Polars's silent-or-null semantics into a scan-time
`ComputeError`. CAST arm now refuses non-Strict via
`if !options.is_strict() { return None; }`.

**Doc sweep (F-CAST-004/005/006 + FRESH-002 — H1 cross-reference drift)**

- Module-level header updated to `PR-2.1 foundation + PR-2.2 + PR-2.3 extensions`.
- Shape count corrected from 14 → 15 (added cast row to the table with the
  kind-gate notation).
- "What this module does NOT cover yet" — `CAST → PR-2.3` line removed (CAST
  is now covered); cross-kind CAST + non-Strict options added as explicit
  remaining-residual notes.
- "Wiring" section: replaced the brittle `lower_ir.rs:780-791` line-range
  reference (already stale from PR-2.2 cycle-2's virtual-column comment
  expansion) with a stable anchor pointing at the `FileScanIR::Vortex` arm
  of `lower_ir`.
- `polars_dtype_to_vortex_dtype` internal Decimal comment rewritten to
  explicitly note the fall-through (was misleadingly positioned as if
  describing a Decimal arm); added Int128/UInt128 exclusion rationale
  cross-referencing `is_vortex_numeric_dtype`.

**Tests**

Updated 5 existing CAST tests to pass `Some(&schema)` (the new gate refuses
without schema). Added 9 new tests:
- `shape_cast_bool_to_bool`, `shape_cast_string_to_string` — same-kind
  degenerate (validity widening) variants
- `shape_cast_int_to_bool_returns_none`, `shape_cast_int_to_string_returns_none`,
  `shape_cast_bool_to_int_returns_none`, `shape_cast_string_to_int_returns_none`
  — cross-kind refusals
- `shape_cast_without_schema_returns_none` — conservative-refuse path
- `shape_cast_non_strict_returns_none`, `shape_cast_overflowing_returns_none`
  — CastOptions refusals

Added Python e2e test `test_scan_with_cross_kind_cast_filter` to lock the
M1 fix: `pl.col(int_col).cast(pl.String) == "101"` must NOT crash the scan;
a regression dropping the source-kind gate would surface as a Vortex
scan-time error.

**Test count**: 42 → 49 (7 new — net +9 added, but 5 reused from previous
schema-less variants reshape to schema-using). With `dtype-decimal`: 50.
Python e2e: +1.

**Verification**: `cargo test -p polars-plan --features vortex,is_between,is_in
vortex_convertor` → 49 passed; `cargo check -p polars-stream --features
vortex,cloud` clean; `cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…doc-section + stale-ref sweep

Cycle-2 gauntlet (2-vote, fresh+correctness, both ACCEPT high-confidence) found
zero must-fix and 5 should-fix items. Applied inline.

- **C2-CAST-001 (correctness, should-fix)**: completed the cross-kind refusal
  matrix — added `shape_cast_bool_to_string_returns_none` and
  `shape_cast_string_to_bool_returns_none`. Previously only Int↔Bool and
  Int↔String were tested; Bool↔Utf8 cross-kind pairs are equally reachable
  (`pl.col(bool).cast(pl.String)`, `pl.col(str).cast(pl.Boolean)`).

- **C2-CAST-002 (correctness, should-fix)**: added function-level
  `# CAST semantic caveat` doc section to `aexpr_to_vortex_expression`,
  mirroring the existing `# Bitwise-vs-logical operator caveat` and
  `# Arithmetic semantic caveat` sections. The cycle-1 in-arm body comment
  is now duplicated at the function header so readers don't have to scan
  the match body to discover the two CAST gates. Updated the `schema`
  parameter doc to list all THREE gates (And/Or/Not + Plus + CAST) instead
  of only the first two.

- **FS-CYCLE2-001 (fresh, should-fix)**: stale internal cross-references in
  test-section comments:
  - line 577 comment said `15 shapes: 14 foundation + Plus` (omitted Cast);
    corrected to `13 foundation + Plus + Cast`.
  - lines 890/909 doc-strings referenced specific line numbers in
    `operand_is_numeric`'s Plus arm that shifted after cycle-1 inserted
    `resolve_inner_dtype` + `cast_kind_compatible`. Replaced with symbol
    references (the cycle-2 pattern adopted by the wiring section).

- **Deferred (NOT applied inline)**:
  - **PR22-FROM-PR23-REVIEW-001 (carry-forward)**: correctness reviewer
    surfaced a sibling bug class via H4 fix-attention — the Plus convertor's
    `operand_is_numeric` checks per-operand numeric-ness but not pairwise
    supertype existence. `Plus(Int8, Int64)` would emit `checked_add(int8,
    int64)`, and Vortex's `Binary::coerce_args` (binary/mod.rs:105) computes
    `least_supertype(I8, I64) → I64` but `coerce_expression` does not appear
    to be auto-applied to filter expressions — the convertor's emitted
    expression goes through `return_dtype` at line 119-127 which requires
    `lhs.eq_ignore_nullability(rhs)` and would bail. **Pre-existing from
    PR-2.2** (not introduced by PR-2.3). Carry-forward to PR-2.4+ for a
    direct e2e verification; if reproducible, mirror the CAST fix pattern
    with a literal-typing helper or numeric-supertype gate in Plus.
  - **PR23-FUTURE-001 (carry-forward)**: Float16 not in
    `polars_dtype_to_vortex_dtype` nor `is_vortex_numeric_dtype`. Vortex
    has `PType::F16`; perf-miss only (conservative refuse is SOUND). Defer
    until Polars's Float16 stabilization lands in this branch's pinned
    version.

**Test count**: 49 → 51 convertor unit tests (+2 Bool↔Utf8 cross-kind).
With `dtype-decimal`: 52.

**Verification**: `cargo test -p polars-plan --features vortex,is_between,is_in
vortex_convertor` → 51 passed; `cargo check -p polars-plan --features vortex`
clean; `cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…roactive Plus cross-PType gate

**Primary scope (PR-13.4)**: extend the AExpr-direct convertor with
`AExpr::Function { StructExpr(FieldByName(name)), .. }` → `vortex::expr::get_item(name, inner)`.
Mirrors vortex-duckdb's `TableFilterClass::StructExtract` precedent
(`vortex-duckdb/src/convert/table_filter.rs:71-73`).

**Hostile-input audit applied UPFRONT** (per the PR-2.3 cycle-1 process lesson):
Vortex's `GetItem.return_dtype` (`vortex-array/src/scalar_fn/fns/get_item.rs:94-96`)
`vortex_err!`s at scan-time if the requested field name isn't in the struct's
fields — same class as the PR-2.3 cycle-1 cross-kind CAST bug. The convertor
now gates on schema-membership: refuse pushdown when the schema is unavailable
OR when the resolved inner dtype isn't a `Struct(fields)` containing the
requested field.

- New `struct_field_exists(dtype, name)` helper.
- Extended `resolve_inner_dtype` to handle `StructExpr(FieldByName)` — resolves
  to the field's dtype, enabling nested-struct field-access via recursion
  (`s.field("outer").field("inner")` chains correctly).
- StructField arm is gated on `feature = "dtype-struct"`.

**5 new convertor tests** (all `cfg(feature = "dtype-struct")`):
- shape_struct_field_then_compare — happy path: `s.field("inner") == "x"`
- shape_struct_field_without_schema_returns_none — conservative refuse
- shape_struct_field_unknown_field_returns_none — schema gate fires
- shape_struct_field_on_int_column_returns_none — non-Struct dtype refuses
- shape_struct_field_nested — `s.field("outer").field("inner")` recursion

**Python e2e**: `test_scan_with_struct_field_filter` against a 4-row Struct
`{inner: String, count: Int32}` column; asserts `s.struct.field("inner") == "x"`
returns the 2 matching rows.

**Secondary scope: proactive Plus cross-PType pairwise gate** (PR-2.3 cycle-2 H4
carry-forward — verified the Plus bug class reproduces in principle by tracing
Vortex's code: `Binary::return_dtype` requires `lhs.is_primitive() &&
lhs.eq_ignore_nullability(rhs)` and `vortex_bail!`s otherwise; `coerce_expression`
is NOT called by vortex-scan/file/layout, so cross-PType operands DO reach
`return_dtype` un-coerced when Polars's TYPE_COERCION optimizer is disabled).

Added a pairwise-equal-PType gate to the Plus arm: after the numeric gate,
require `resolve_inner_dtype(lhs) == resolve_inner_dtype(rhs)`. Extended
`resolve_inner_dtype` to handle `Operator::Plus` (delegates to lhs since the
gate ensures lhs == rhs). Extended `operand_is_numeric` to handle `Cast` (a
Cast to a numeric target produces numeric output).

**2 new Plus tests**:
- shape_plus_cross_ptype_returns_none — `col_int32 + lit_i64` refused
- shape_plus_cast_then_same_ptype_passes — `Cast(col_int32, Int64) + lit_i64`
  passes (TYPE_COERCION-inserted Cast aligns dtypes)

**Module-level doc updates**: shape table now lists 16 shapes (15 PR-2.1/.2/.3
+ struct field access); "What this module does NOT cover yet" updated.

**Final test count**: 51 → 58 with `dtype-struct` (was 56 with PR-2.3; +2 Plus
tests + 5 struct tests). Without `dtype-struct`: 53 (was 51, +2 Plus tests).

**Verification**: `cargo test -p polars-plan --features vortex,is_between,is_in,dtype-struct
vortex_convertor` → 58 passed; `cargo check -p polars-stream --features vortex,cloud`
clean; `cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e + Plus self-contained dtype resolve + missing tests

Cycle-2 gauntlet (2-vote pr-2, fresh + correctness, both ACCEPT high-confidence)
surfaced zero must-fix and a SIBLING bug class via fix-attention H4. Applied the
should-fix items inline:

**F-COMPARE-CROSS-PTYPE-001 (fresh, should-fix → proactively applied)**

Same bug class as the PR-2.4 commit-1 Plus pairwise gate. Vortex's
`Binary::return_dtype` (`vortex-array/src/scalar_fn/fns/binary/mod.rs:130-136`)
`vortex_bail!`s with "Cannot compare different DTypes" for comparison ops when
`!lhs.eq_ignore_nullability(rhs) && !lhs.is_extension() && !rhs.is_extension()`.
The convertor was emitting `eq/not_eq/lt/lt_eq/gt/gt_eq` without a pairwise
check, so a TYPE_COERCION-disabled or directly-constructed AExpr like
`col_int32 == lit_i64(42)` would scan-time-error instead of falling back to
residual. Added a comparison pairwise-equal-PType gate mirroring the Plus
gate's discipline: requires schema, resolves both operand dtypes, refuses on
inequality. Extension types are exempt at the Vortex level (Date↔Datetime
comparisons are permitted), but Polars extension dtypes don't currently route
through `resolve_inner_dtype`'s shapes anyway so no special-casing needed
here.

**F-RESOLVE-PLUS-LHS-DELEGATION-001 (fresh, should-fix) + N3 (correctness, nit)**

`resolve_inner_dtype`'s Plus arm previously delegated to lhs only, relying on
the convertor's Plus arm gate having fired (lhs == rhs invariant). Hardened to
be self-contained: walks both operands, returns the common dtype only if they
agree. Future callers of `resolve_inner_dtype` can trust the result without
relying on a non-local cross-function invariant.

**S1 (correctness, should-fix)**: stale in-line comment at the `_ => None` arm
still listed `StructField → PR-2.4` as a follow-up — removed (PR-2.4 ships it).

**Missing tests added**:
- `shape_eq_without_schema_returns_none` — locks the schema-required contract
- `shape_eq_cross_ptype_returns_none` — Int32 == Int64 refused
- `shape_lt_cross_ptype_returns_none` — Int32 < Float64 refused
- `shape_plus_int_plus_float_returns_none` — Int + Float refused (F-PLUS-CROSS-FLOAT-INT-TEST-001)
- `shape_plus_uint_plus_int_returns_none` — UInt32 + Int32 refused (N2 signed/unsigned mismatch)

**Existing tests updated**: 8 comparison tests (Eq/NotEq/Lt/LtEq/Gt/GtEq +
LogicalAnd/LogicalOr) updated to pass `Some(&schema_a_b_int32())` since the
new gate requires schema. `shape_cast_then_compare` updated to use a `lit_i64`
literal to mirror the post-TYPE_COERCION shape that production AExpr produces
(the gate refuses Int64-cast on lhs vs Int32-lit on rhs).

**Deferred (NOT applied inline — cycle-2 nits)**:
- F-STRUCT-FIELD-STRUCTURAL-001 (no structural assertion on struct field
  tests) — same class as the deferred "tautological tests" carry-forward;
  defer until the broader structural-assertion cleanup pass.
- F-STRUCT-FIELD-EXISTS-IMPORT-STYLE-001 (`struct_field_exists` uses
  fully-qualified `polars_utils::pl_str::PlSmallStr` in signature) — cosmetic
  inconsistency; defer.
- F-MODULE-DOC-NON-PREDICATE-NOTE-001 (`Other struct functions` could be
  reworded) — cosmetic doc-polish; defer.

**Test count**: 58 → 63 with `dtype-struct` (+5 new tests); 53 → 58 without
(the cycle-2-added tests are not feature-gated).

**Verification**: `cargo test -p polars-plan --features
vortex,is_between,is_in,dtype-struct vortex_convertor` → 63 passed;
`cargo check -p polars-stream --features vortex,cloud` clean; `cargo fmt
--all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…→ A cutover (delete SpecializedColumnPredicate fast path)

The final PR of Phase 2's PR-13 trajectory. The AExpr-direct convertor at
`polars_plan::plans::predicates::vortex_convertor::aexpr_to_vortex_expression`
(introduced in PR-2.1, wired at `lower_ir.rs` in PR-2.2, extended with
arithmetic / CAST / struct field access across PR-2.2-.4) is now the sole
filter-pushdown path for Vortex scans.

**Deletions** (`polars-vortex/src/read/predicate.rs`):
- `polars_to_vortex_predicate` (top-level entrypoint, ~15 LoC)
- `convert_specialized` (SpecializedColumnPredicate dispatcher with Eq /
  Between / EqualOneOf / StartsWith / EndsWith / RegexMatch arms, ~50 LoC)
- `bytes_to_like_literal` (UTF-8 + SQL-LIKE-special-char validation for the
  Starts/EndsWith arms, ~7 LoC)
- LIKE-literal unit tests (3 tests covering the deleted helper)
- Module imports for the deleted dependencies (`ScanIOPredicate`,
  `SpecializedColumnPredicate`, `Expression`, `and_collect`, `eq`,
  `get_item`, `gt_eq`, `like`, `lit`, `lt_eq`, `or_collect`, `root`,
  `PlSmallStr`)

**Preserved** (still required by the AExpr-direct convertor):
- `polars_scalar_to_vortex` (the canonical `Scalar` → `VortexScalar` mapping;
  PR-2.1 exposed as `pub` for cross-crate reuse from polars-plan's convertor)
- `temporal` submodule (Date / Datetime / Time / Decimal scalar helpers,
  feature-gated on polars-core's `dtype-*` features)
- All scalar-conversion + temporal + decimal unit tests
- Added 2 new tests: `decimal_scalar_round_trip` (verifies Vortex `DType::Decimal`
  precision/scale match) + `decimal_scalar_overflow_returns_none` (verifies the
  >38 precision / >i8::MAX scale refuse paths). The module's BAN against `as`
  casts on Vortex Decimal precision/scale is now regression-tested.

**Updates** (`polars-stream/src/nodes/io_sources/vortex/`):
- `mod.rs`: removed the `polars_to_vortex_predicate` import and the fallback
  call at `begin_read`. `filter_expr` now reads `self.aexpr_filter` directly
  when `options.push_predicate` is set, else `None` (post-decode reapply
  handles correctness via `PARTIAL_FILTER`).
- `builder.rs`: updated `aexpr_filter` field doc-comment and the inline
  `build_file_reader` comment to describe the AExpr-direct path as sole.
  `reader_capabilities` comment updated: `PARTIAL_FILTER` rationale now cites
  unhandled AExpr shapes (Sort/Gather/etc.) rather than `SpecializedColumnPredicate`.

**Documentation sweep**:
- `polars-vortex/README.md`: replaced the bullet point + ASCII layer-map line
  + §4 "Filter pushdown via `SpecializedColumnPredicate`" subsection with
  AExpr-direct equivalents. The new §4 lists the 14 shapes the convertor
  handles (PR-2.1 + PR-2.2 + PR-2.3 + PR-2.4 union) plus the gate suite
  (bitwise-vs-logical, numeric-only Plus, kind-compatible CAST,
  pairwise-equal-PType comparison, schema-membership struct field).
  Historical note preserves the Option B → A trajectory for cross-reference.
- `polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs`: module-level
  doc-block and `## Wiring` section now describe the AExpr-direct path as
  sole (removed "PR-2.6 will delete" forward-references).

**Net diff**: -244 LoC (310 deletions, 66 insertions across the 4 files).
The simpler architecture removes the parallel-path divergence-checking burden
mentioned throughout PR-13.1-.5's documentation.

**Verification**:
- `cargo check -p polars --features vortex` clean
- `cargo test -p polars-vortex --features dtype-date,dtype-datetime,dtype-time,dtype-decimal predicate` → 8 passed
  (was 11 — removed 3 LIKE-literal tests + added 2 Decimal tests)
- `cargo test -p polars-plan --features vortex,is_between,is_in vortex_convertor` → 58 passed (unchanged)
- `cargo fmt --all -- --check` clean
- Architectural BAN check passes: no new direct `vortex`-internal symbol use
  outside the bridge files (predicate.rs's narrowed import set is now even
  tighter: only `vortex::array::scalar::Scalar` + `vortex::dtype::Nullability`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment cutover-lost shapes + doc-sweep

Cycle-1 gauntlet (2-vote pr-2) both REJECTED at high-confidence on the same root
cause: PR-2.6 silently lost pushdown coverage for four predicate shapes the
deleted legacy `SpecializedColumnPredicate`-derived path handled:

  1. `is_between(lo, hi)` — IRBooleanFunction::IsBetween
  2. `is_in([...])` — IRBooleanFunction::IsIn
  3. `str.starts_with(prefix)` — IRStringFunction::StartsWith
  4. `str.ends_with(suffix)` — IRStringFunction::EndsWith

None of these have arms in the AExpr-direct convertor; they fall through to the
`_ => None` residual arm. Correctness is preserved via multi-scan post-decode
`PARTIAL_FILTER` reapply (the multi-scan layer always re-applies the original
predicate to emitted morsels), but the loss is a perf regression — these
predicates no longer benefit from Vortex zone-pruning.

The Phase 2 exit criterion (b) requires "every row in §5 pushdown coverage
table implemented + tested OR documented as deliberately deferred". Since
PR-2.6 is the deletion PR (scope: remove the parallel path), the cleanest
resolution is to formally defer the four lost shapes rather than mix in
~65 LoC of new arm-implementation work. The cycle-1 reviewer notes the
cutover itself is structurally clean — the regressions are properly
contained and reasoning-driven defer.

**Changes applied**:

1. **Formal Deferred-work entry** (`.big-plans/vortex-integration.md`):
   "PR-2.6 cutover-lost pushdown shapes" enumerates the four shapes with
   the relevant AExpr matchers, the mechanical port shape (LoC estimates),
   and the resolution path. `deferred_items_total: 13` (was 12).

2. **predicate.rs module doc-block** (MF-001 correctness + N-001 fresh):
   - Replaced internally-contradictory "handles every shape the legacy
     path handled (...StartsWith/EndsWith NOT YET...)" paragraph with a
     "Coverage parity" section that EXPLICITLY enumerates: (a) shapes the
     convertor handles; (b) the four shapes lost in cutover (with
     Deferred-work cross-reference); (c) why correctness is preserved
     (PARTIAL_FILTER reapply).
   - Removed the date-inline "(2026-05-16)" per N-002 (git blame is the
     canonical timestamp).
   - Updated `polars_scalar_to_vortex`'s doc-comment to describe the
     Nullability::Nullable choice in terms of the actual AExpr-direct
     caller, not the deleted SpecializedColumnPredicate path (MF-004).
   - Fixed the canonical module path to include `aexpr::` (N-001).

3. **lower_ir.rs FileScanIR::Vortex arm comments** (MF-005 correctness +
   fresh): rewrote the in-arm comment block to describe the current
   runtime behavior. Was: "begin_read can prefer aexpr_filter OVER the
   legacy SpecializedColumnPredicate path / falls back to the legacy path
   automatically". Now: "begin_read uses aexpr_filter directly; when
   convertor returns None or guard refuses, aexpr_filter stays None and
   no filter pushes down — multi-scan reapply handles correctness via
   PARTIAL_FILTER". The virtual-column-guard rationale now cites the
   AExpr-direct path's bare-`get_item` emission as the actual reason
   (not "the legacy path is virtual-column-safe", which is no longer
   meaningful).

4. **builder.rs** (N-001): fixed the canonical module path in the
   aexpr_filter field doc to include `aexpr::`.

5. **README.md** (MF-002, MF-004): coordinated sweep of three
   sections that contradicted PR-2.6's actual state:
   - **"Pushdown coverage at a glance" table**: rebuilt around the actual
     AExpr-direct shapes. Equal/comparisons/and/or/not/is_null/is_not_null/
     Plus arithmetic/same-kind CAST/struct-field-access are now ✅ (cited
     to the AExpr-direct convertor). is_between/is_in/starts_with/
     ends_with/temporal-extracts/non-Strict-CAST/cross-kind-CAST are now
     ❌ residual (cited to the corresponding Deferred-work entries).
   - **"What works today"**: replaced the SpecializedColumnPredicate
     bullet with an accurate enumeration of AExpr-direct coverage.
   - **"Known limits / pending follow-ups"**: added the cutover-lost
     shapes as the lead item, kept the temporal-extracts (PR-2.5 slip),
     added non-Strict / cross-kind CAST + Duration, removed the stale
     "Aggressive predicate pushdown (arithmetic, CAST, struct field access,
     temporal extracts) is not yet wired" claim (3 of those 4 ARE wired
     as of PR-2.2/.3/.4).
   - **Crate layout block**: predicate.rs's role updated to
     "polars_scalar_to_vortex (Scalar → VortexScalar)".

**Deferred (out of PR-2.6 scope, in Deferred work)**:
- The 4 cutover-lost shapes (IsIn, IsBetween, StartsWith, EndsWith) —
  port to AExpr-direct convertor with bytes_to_like_literal port for
  LIKE-safety. Tracked.
- SF-001 (decimal_scalar_overflow_returns_none test boundary-value
  comment) — cosmetic; the test is still correct.
- SF-002 (commit-body overstating coverage parity) — addressed by the
  new explicit "Coverage parity" doc-block enumeration.

**Verification**: `cargo check -p polars --features vortex` clean;
`cargo test -p polars-plan --features vortex,is_between,is_in,dtype-struct
vortex_convertor` → 63 passed; `cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egression in doc-comments

Cycle-2 N-CYCLE2-001 (both reviewers caught): cycle-1's "canonical path fix"
actually REGRESSED accuracy. I inserted `aexpr::` into two doc-comments
expecting `polars_plan::plans::aexpr::predicates::vortex_convertor::...` to be
the canonical path — but `crates/polars-plan/src/plans/mod.rs:7` declares
`pub(crate) mod aexpr;`, so external crates can only reach `predicates` via
the glob re-export `pub use aexpr::*` at `plans/mod.rs:31`. The actual
externally-resolvable path is `polars_plan::plans::predicates::vortex_convertor`
(no `aexpr::` segment) — verified at `lower_ir.rs:24`'s import:
`use polars_plan::plans::predicates::vortex_convertor;`.

Doc-comments don't compile-check, so the build didn't break; but the textual
claim was wrong in two places. Reverted both to the externally-resolvable
form and added a one-line clarification: "the `aexpr` module is `pub(crate)`;
the externally-resolvable path goes through the `pub use aexpr::*` re-export
at `plans/mod.rs`".

Cycle-2 N-CYCLE2-002 (4 sentinel refuse-tests for the deferred shapes) is
correctly deferred per the reviewer's recommendation — they belong in the
follow-up PR that ships the actual arms.

**Verification**: `cargo check -p polars --features vortex` clean;
`cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 4-vote phase-end gauntlet (spec + correctness + maint + arch lenses)
verdict: **ACCEPT** (all 4 reviewers, zero must-fix). Applied the cheap
should-fix items inline before the Step 3.4 user checkpoint:

**CI green-up** (spec lens SF-5 / correctness lens-supplied via gh pr checks):
- ruff D205: 6 docstring summary-line lints in `py-polars/tests/unit/io/test_vortex.py`
  (lines 117/139/160/190/219/248). Each new Python e2e test's docstring summary
  now fits on one line, satisfying the "1 blank line required between summary
  line and description" rule.
- clippy::let_and_return (clippy-nightly on `cast_kind_compatible` at
  vortex_convertor.rs:561): collapsed the `let same_kind = ...; same_kind`
  return-pattern into a direct expression. Also dropped to a single `matches!`
  call covering both the (Boolean, Boolean) and (String, String) cases.

**Plan / doc cleanup** (arch lens, maint lens, spec lens findings):
- **arch lens** (architecture-vs-implementation drift): Plan §"Four settled
  architectural moves" #4 used to read "SpecializedColumnPredicate already
  extracted by the optimizer" describing the legacy path that PR-2.6 just
  deleted. Rewrote to reflect the post-PR-2.6 reality: AExpr-direct convertor
  at IR-build time is the SOLE Vortex pushdown path, with the 16 implemented
  shapes + 4 schema gates enumerated.
- **maint lens** (S-ORPHAN-STUB-001): deleted the orphaned
  `crates/polars-vortex/src/read/aexpr_predicate.rs` stub. It was never
  registered in `read/mod.rs` so it never compiled, but lingered as
  PR-2.1 architectural-correction scaffolding. The lesson (convertor lives
  in polars-plan, not polars-vortex, because dep arrow reverses) is preserved
  in three other places (the convertor's module-level doc, the plan PR-2.1
  row, the relevant commits).
- **spec lens SF-1** (Phase 2 exit (b) — missing §5 rows): added a Deferred
  work entry covering the two §5 pushdown rows the convertor didn't get arms
  for and weren't formally deferred: `Function String::Contains{literal:true}`
  (mechanical port, ~15 LoC mirroring the StartsWith/EndsWith Deferred entry)
  and `Ternary { predicate, then, otherwise }` (~15 LoC, Vortex's `case_when`
  builder is publicly exposed). Both naturally absorb into PR-2.7 with the
  rest of the cutover-lost shapes.
- **spec lens SF-3** (stale tradeoff entry): strikethrough the
  "SpecializedColumnPredicate fast path preserved during PR-13 transitional
  phases (Option B trajectory)" Accepted-tradeoffs entry. Marked as "NO LONGER
  ACCEPTED — PR-2.6 deleted the fast path entirely" mirroring the strikethrough
  pattern used elsewhere (Phase 1 path-dep, visitor-cfg-gating).
- **spec lens nit** (stale `last_user_touchpoint_what`): updated to reflect
  the autonomous Phase 2 completion + 4-vote review acceptance + pending user
  checkpoint.

**Counter update**: `deferred_items_total: 14` (was 13).

**Deferred (NOT applied inline — phase-end should-fix items)**:
- spec lens SF-2 / SF-4 (PR-2.0 Files-touched amendment): cosmetic plan-row
  update; defer to Phase 3 entry housekeeping.
- spec lens SF-6 (deferred_items_total reconciliation): the counter +1 from
  this commit (13→14) matches the actual bullet count post the new entry; the
  spec lens's reported 15/13 mismatch was based on counting bundled entries
  as individual items, which conflicts with the established convention.
- maint lens S-EXTERNAL-PATH-FRAGILITY-002 (canonical-path compile-anchor):
  defer — would require adding a compile-time `use` test in a `tests/`
  directory; non-trivial scope, save for Phase 3 housekeeping.
- maint lens S-DEFERRED-WORK-PROMOTION-003 (triage Deferred section into
  3 buckets): substantive plan-doc reorganization; defer to Phase 3 entry.
- maint lens S-STRUCT-FIELD-DOC-GAP-004 (recursive resolve_inner_dtype
  invariant): one-line doc-comment addition; defer to next struct-field-related
  change.
- maint lens S-SCHEMA-NONE-ASYMMETRY-NIT-005, all other nits: cosmetic.
- correctness lens DOC-INACCURATE-EXTENSION-EXEMPTION-001 (extension-type
  doc nit): minor doc-comment correction; defer.
- correctness lens PERF-RESOLVE-INNER-DTYPE-O-N-SQUARED-001: perf
  characterization, not a bug; defer to Phase 4 perf work.

**Verification**: `cargo check -p polars --features vortex` clean;
`cargo test -p polars-plan --features vortex,is_between,is_in,dtype-struct
vortex_convertor` → 63 passed; `cargo test -p polars-vortex --features
dtype-date,dtype-datetime,dtype-time,dtype-decimal predicate` → 8 passed;
`cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lwwmanning added 22 commits May 19, 2026 11:45
…gate (review finding)

Vortex's Like kernel (vortex-array/src/scalar_fn/fns/like/mod.rs:124-132
return_dtype) vortex_bail!s with 'Cannot apply like to non-Utf8 input' at
scan-time if input[0] isn't Utf8. The StringExpr arm fed input[0] directly
into the recursive convertor without verifying the column's dtype, so a
non-String input (struct field access producing Int32, Cast whose source
isn't String) would silently push a like(<non-utf8>, <utf8-pattern>) that
fails at scan-time, violating the always-SAFE-fallback contract. Same bug
class as PR-2.3 cycle-1 CAST cross-kind must-fix.

Add an explicit Utf8 input gate (3 lines) BEFORE any needle extraction or
pattern construction work. Update existing 4 StringExpr unit tests
(shape_starts_with_positive / _wildcard_in_needle / shape_ends_with_positive
/ shape_contains_literal_{true,false}) to pass schema (the gate now consults
schema?). Add 4 new unit tests: shape_starts_with_on_int_column_returns_none
(must-fix #3 regression), shape_starts_with_without_schema_returns_none
(no-schema refuse), shape_starts_with_underscore_in_needle_returns_none and
shape_starts_with_backslash_in_needle_returns_none (close cycle-1 should-fix
#6e and nit pola-rs#11 coverage gap).

Also adds schema_s_string() and schema_s_string_i_int32() test helpers.

(cherry picked from commit b2b9659)
… Utf8 gate) — all 3 must-fix done; should-fix sweep next

(cherry picked from commit 8397a3a)
- vortex_convertor.rs module-level doc-comment refresh (should-fix pola-rs#7):
  bumps shape count 16 -> 22, adds 6 PR-2.7 shapes to the table with their
  Vortex builder + gate notes, removes Ternary from the 'not covered yet'
  list, updates PR-2.5 reference to reflect the Deferred-work slip.

- 5 structural-assertion unit tests for the new shapes (should-fix #4 —
  acceptance criterion (c)): shape_is_between_structural,
  shape_starts_with_structural, shape_ends_with_structural,
  shape_contains_literal_true_structural, shape_ternary_structural. Each
  asserts the produced Vortex Expression's Display contains the expected
  builder anchor (and/like/case_when) plus the column/literal substrings.
  Catches paste-swap bugs the prior .is_some()-only tests would miss
  (mirrors the shape_plus_arithmetic_structural discipline from PR-2.2
  cycle-1). is_in structural omitted — its Series-literal arena
  construction needs SpecialEq<Series> infrastructure not yet wired into
  the test helpers; e2e Python test covers correctness.

- README.md is_in entry refinement (cycle-1 nit pola-rs#10): document the
  per-scalar-conversion refuse path (Duration scalars in haystack fail
  polars_scalar_to_vortex) alongside the existing nulls_equal+nulls
  refuse-path note.

(cherry picked from commit 9473106)
…Ternary arm (review finding)

The cycle-1 must-fix #2 fix added a THEN/ELSE pairwise-dtype gate to the
Ternary convertor arm, calling resolve_inner_dtype(*truthy)? to verify
match. But resolve_inner_dtype had no Ternary arm — fell to the catchall
_=>None — so any NESTED Ternary truthy/falsy would unconditionally return
None, ?-propagating to refuse pushdown even when inner dtypes match.
Fail-closed-safe but a real coverage regression vs the pre-gate baseline.

Add an AExpr::Ternary arm to resolve_inner_dtype that recurses on
truthy+falsy and returns Some(t) when t==e, mirroring the Plus arm's
self-contained verification pattern at L726-730. Walk truthy + falsy,
require equality (Vortex's case_when builder requires THEN and ELSE share
a return_dtype), unresolvable shapes fall through to None.

Add a regression test shape_ternary_nested_matching_dtypes_pushes that
constructs a 2-level nested Ternary (outer.truthy = inner Ternary) and
asserts pushdown engages. Before this fix the test would have asserted
.is_some() but produced None.

(cherry picked from commit fa89c9b)
…/is_in imports (review finding)

PR-2.7 added 5 new imports (AnyValue, VortexScalar, like, or_collect, and
the fn bytes_to_like_literal) that are used ONLY inside arms gated on
strings or is_in, but the imports themselves were ungated. Under
`cargo build -p polars-plan --features vortex --no-default-features`
this produced 5 fresh unused-import warnings; the umbrella CI build uses
--all-features so the warnings didn't fail loud.

Gate each import + the fn definition matching the existing pattern
(e.g., the cycle-1 PR-2.7 already gated IRStringFunction on
`#[cfg(feature = "strings")]` — same idiom for the underlying types
the StringExpr arm uses).

Split AnyValue from DataType in the prelude line (DataType is used by
every arm — must stay ungated; AnyValue is used only in the StringExpr
needle extraction). Split like/or_collect out of the multi-import on the
vortex::expr line and gate each on strings/is_in respectively.

Verified clean: cargo check -p polars-plan --features vortex (no
warnings on vortex_convertor.rs), with all PR-2.7 features
(--features vortex,is_between,is_in,regex,strings,dtype-struct, same),
and umbrella cargo check -p polars --features vortex,cloud,parquet,dtype-full.
88 unit tests pass.

(cherry picked from commit d457a13)
…hen structural tests (review finding)

Cycle-1 added 5 structural-assertion tests (shape_is_between_structural,
shape_starts_with_structural, shape_ends_with_structural,
shape_contains_literal_true_structural, shape_ternary_structural) to
close cycle-1 acceptance criterion (c) but cycle-2 reviewers flagged
three with loose assertions that don't catch specific paste-swap
classes:

#3 shape_is_between_structural: vacuous negative anchor
'!s.contains("eq(") && !s.contains("==")'. Vortex's binary Display
uses single-char '=' ('(x = 10)'), never 'eq(' or '=='; both substrings
are absent in correct AND eq-paste-swap outputs. Replace with
's.contains(">") && s.contains("<")' (an eq-paste-swap '(x = 10)
AND (x = 20)' has neither '>' nor '<').

#4 shape_starts_with_structural + shape_ends_with_structural: don't
catch starts↔ends paste-swap. Both produce 'like'+needle+'%' Display
strings differing only in '%' position. Strengthen with joint substring
(s.contains("prefix%")) + negative anchor (!s.contains("%prefix")).
Mirror for ends_with.

pola-rs#5 shape_ternary_structural: doesn't catch truthy↔falsy paste-swap.
case_when(cond, b, a) still satisfies s.contains('a') && s.contains('b')
because both column refs remain present. Switch truthy/falsy from cols
'a' / 'b' to distinct literals 777 / 888 + assert positional ordering
(t_pos < f_pos in the Display string).

All 88 tests pass (5 structural tests strengthened; shape_plus_arithmetic_structural
unchanged).

(cherry picked from commit a9918a2)
…+ nit pola-rs#7 — README gate notes + is_in empty-haystack comment (review finding)

- README pushdown table: add the three schema-gate notes that the cycle-1
  fix-commits added to the convertor module-doc table but didn't propagate
  to README. is_between → '(schema-gated pairwise-PType)'. starts_with /
  ends_with / contains → '(schema-gated Utf8 input)'. Ternary → '(schema-gated
  THEN/ELSE pairwise-dtype)'. Closes the H1 cross-reference drift between
  convertor source and README documented in cycle-2 finding pola-rs#6.

- vortex_convertor.rs is_in arm: add a 6-line comment before or_collect(terms)
  acknowledging that 'or_collect(vec![])' returns None for an empty haystack
  → arm returns None. Polars's post-decode residual reapply handles the
  all-false 'is_in([])' semantic correctly, so this is a missed
  optimization (we could pre-empt with lit(false)) — not a bug. Closes
  cycle-2 finding nit pola-rs#7.

All 88 unit tests still pass.

(cherry picked from commit aebb302)
…e Deferred entry

Polars's is_in uses TotalOrd (NaN==NaN); Vortex's eq uses IEEE 754 (NaN!=NaN).
This divergence affects ALL float-comparison convertor arms (foundation
eq/not_eq/lt/lt_eq/gt/gt_eq since PR-2.1 + PR-2.7's new is_in/is_between).
The legacy SpecializedColumnPredicate::EqualOneOf path had the same
divergence; PR-2.7 just amplifies the surface.

Add a Deferred work entry capturing the divergence with two mitigation
paths: (a) refuse pushdown when any float literal is NaN (~10 LoC,
conservative); (b) accept the divergence and document as known
limitation. Not blocking — residual reapply preserves correctness.
Tracked so a future read doesn't assume float pushdown is fully isomorphic
with Polars semantics.

(cherry picked from commit 5e68b4f)
Three-cycle inner-loop gauntlet history:
- Cycle 1 REJECT (3 must-fix + 6 should-fix + 2 nit) — schema-gate gaps in
  the same bug class as PR-2.3 / PR-2.4 cycle-1 pairwise-PType must-fixes.
- Cycle 2 ACCEPT (0 must-fix + 6 should-fix + 2 nit) — gates correctly
  implemented; user picked maximum-thoroughness apply-all-polish-inline.
- Cycle 3 ACCEPT (0 must-fix + 0 should-fix + 1 nit) — polish verified
  clean; only finding is pre-existing CastOptions unused-import (PR-2.3
  era, out of PR-2.7 scope).

17 PR-work commits 065df50..71540df (feat: convertor arms; docs: e2e
tests + README; plan: amend complete + awaiting + cycle 1 atomic; fix×3 +
decrement×3 must-fix gates; docs: cycle-1 should-fix sweep; fix×3 cycle-2
should-fix; docs + plan: cycle-2 polish + NaN Deferred + cycle-2 polish
landed). 25 new convertor unit tests + 7 new e2e Python tests. 88 total
convertor tests pass.

Current State advances: status: executing, current_pr: PR-2.8, pr_index: 9,
per-PR counters reset. Phase 2 phase-end review (cycle 2) will fire after
PR-2.8 completes.

Append Implementation status entry capturing scope shipped, tests added,
3-cycle review history, confidence, deferred items (1 new: Float NaN
semantic divergence), and 3 surprises during implementation (H4 self-
reinforcement validated, H2 new-prose internal edge case validated,
prior_fix_commit_sha attention block worked as designed).

(cherry picked from commit 024341d)
…olumn per-column predicate split

The PR-2.2 cycle-1 must-fix M1 conservative refuse at lower_ir.rs:801-815
gave up on convertor pushdown ENTIRELY when ANY of hive_parts /
unified_scan_args.row_index / unified_scan_args.include_file_paths was set.
Correctness held via the multi-scan PARTIAL_FILTER reapply but the
file-column parts of a mixed predicate missed Vortex zone pruning.

PR-2.8 turns the all-or-nothing guard into a per-column split mirroring
polars-mem-engine/scan_predicate/functions::create_scan_predicate's
hive_predicate extraction:

1. Add helper aexpr_file_minterms_to_vortex_expression in
   vortex_convertor.rs (~30 LoC). Walks top-level conjuncts via
   MintermIter, filters to file-only minterms (no leaf-column in
   virtual_cols), converts each via aexpr_to_vortex_expression, and
   AND-collects. Refuses pushdown if no file-only minterms remain.
   Strictly-better than aexpr_to_vortex_expression on the whole tree:
   partial conversion is now possible (a == 1 AND unsupported_op(b)
   pushes a == 1 instead of refusing the whole thing).

2. Refactor lower_ir.rs FileScanIR::Vortex arm to build the
   virtual_cols set from hive_parts.schema().iter_names() +
   row_index.name + include_file_paths, then call the new helper
   unconditionally instead of gating on hive/row_index/file_paths
   presence.

3. Add 5 unit tests covering: all file-only AND-collects all;
   all virtual returns None; partial pushes file part only;
   top-level OR with virtual operand refuses (single minterm
   references both); unsupported subtree dropped in partial push.

4. Add e2e Python test test_scan_with_hive_and_file_col_mixed_filter
   exercising (pl.col('x') > 5) & (pl.col('year') == 2024) over a
   hive-partitioned vortex dataset. Verifies correctness; a regression
   mis-classifying a hive col as file would surface as a Vortex
   ComputeError 'missing column year'.

Resolves Deferred work entry 'Virtual-column-partitioned Vortex scans
don't benefit from AExpr convertor pushdown' (PR-2.2 cycle-1 must-fix
M1 + cycle-2 C2-001).

Verification:
  cargo test -p polars-plan --features vortex,is_between,is_in,regex,strings,dtype-struct vortex_convertor
  → 93 passed (88 prior + 5 new minterms_* tests)
  cargo check -p polars-stream --features vortex,cloud → clean
  cargo check -p polars --features vortex,cloud,parquet,dtype-full → clean
(cherry picked from commit 9d9469f)
PR-2.8 cycle 1 gauntlet (2-vote pr-2 preset: fresh + correctness)
returned accept with 0 must-fix, 3 should-fix on test rigor:

1. Three new minterms_* unit tests (all_file_only_collects_all,
   partial_pushes_file_part_only, unsupported_subtree_dropped_in_
   partial_push) asserted only .is_some(). Paste-swap test: swapping
   the bodies of any two would leave both passing — a regression
   ignoring virtual_cols entirely (pushing both file + virtual
   conjuncts to Vortex) or inverting polarity (keeping virtual,
   dropping file) would not be caught.

2. e2e coverage for partial-push: only hive-col virtual was
   exercised; row_index and include_file_paths virtuals went through
   the same lower_ir.rs branch but had no e2e test.

Fixes:

1. Tighten the 3 unit tests with the joint-substring + negative-anchor
   structural assertion pattern established in PR-2.7 cycle 2
   (shape_starts_with_structural, shape_ends_with_structural).
   Changed literals to unique discriminators (42 kept-side, 99 for
   the second file conjunct, 9999 for dropped virtual, 888 for dropped
   unsupported subtree). Each tightened test now formats the produced
   Vortex Expression and asserts both (a) the kept-side literal is
   present and (b) the dropped-side literal is absent.

2. Add 2 e2e tests:
   - test_scan_with_row_index_and_file_col_mixed_filter — row_index
     virtual + (x > 5) & (ri > 10) filter
   - test_scan_with_include_file_paths_and_file_col_mixed_filter —
     include_file_paths virtual + (x > 5) & src.str.contains('a')
     filter across two files

Each new e2e test verifies result correctness; a regression
mis-classifying the virtual col as a file column would surface as
a Vortex ComputeError on missing column.

Carry-forward (cycle 1 deferred / nits):
- e2e pushdown-engagement verification (.explain() / metrics
  inspection) remains tied to the POLARS_VORTEX_VERIFY_PUSHDOWN
  Deferred entry (no change in this commit).
- virtual_cols hoist outside the and_then closure at lower_ir.rs:807
  — true nit, deferred.

Verification:
  cargo test -p polars-plan --features vortex,is_between,is_in,regex,strings,dtype-struct minterms_
  → 5 passed (all five PR-2.8 minterms_* tests)
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
(cherry picked from commit f55e00b)
…criminator

PR-2.8 cycle 2 gauntlet (2-vote pr-2 preset) rejected with 1 must-fix
caught by BOTH reviewers (fresh + correctness):

  test_scan_with_include_file_paths_and_file_col_mixed_filter used
  pl.col("src").str.contains("a") as the discriminator between
  a.vortex and b.vortex. include_file_paths populates `src` with the
  FULL path (per ScanSourceRef::to_include_path_name at
  crates/polars-plan/src/dsl/scan_sources.rs, no basename
  normalization). pytest's tmp_path fixture generates a path like
  /var/folders/.../pytest-of-USER/pytest-N/
  test_scan_with_include_file_paths_and_file_col_mixed_filter0/
  {a,b}.vortex — the test name itself contains multiple 'a' chars,
  so str.contains("a") evaluates TRUE for BOTH a.vortex and b.vortex.
  The predicate degenerates to `x > 5` alone, returning 5 rows
  instead of the expected 2; assertions fail in CI.

Fix: anchor the discriminator on the basename suffix via
str.ends_with("a.vortex"). Cross-platform, robust against any
tmp_path naming, and clearly intentional. Updated docstring to note
that `src` is the FULL path. Updated final assertion to use
.endswith() (matches the new discriminator's mental model).

The remaining structural assertions on the 3 minterms_* unit tests
(literals 42 / 99 / 9999 / 888 as unique discriminators) are validated
against Vortex's display format (cycle 2 correctness reviewer
confirmed: i32 literals render as `42i32`, no operator or connector
contains digits, no false-positive risk).

Verification:
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
  cargo test -p polars-plan --features vortex,is_between,is_in,regex,strings,dtype-struct minterms_
  → 5 passed
  python3 -c "import ast; ast.parse(open('py-polars/tests/unit/io/test_vortex.py').read())" → OK
(cherry picked from commit c9fd818)
…phase-end review

PR-2.8 cycle 3 ACCEPTED by 2-vote pr-2 gauntlet (0 must-fix, 0 should-fix, 0 nit; both lenses clean).

Plan updates:
1. Current State: status → phase-boundary; current_pr → null; phase_end_cycle → 2;
   last_commit → c9fd818; subagent_invocations_this_pr → 6; review_cycles_this_pr → 3
2. Implementation status: added PR-2.8 entry with 4 commits, 3-cycle review history,
   2 lessons-learned bullets (e2e discriminator anti-pattern + paste-swap-vulnerability
   spans both unit + e2e layers)
3. Deferred work: 'Virtual-column-partitioned Vortex scans don't benefit from AExpr
   convertor pushdown' RESOLVED-in-PR-2.8 with strikethrough convention

Phase 2 amend (PR-2.7 + PR-2.8) complete. Phase 2 exit criteria status:
- (a) 4-vote phase-end review accepts cycle 2 — PENDING (next: this firing)
- (b) §5 pushdown coverage rows implemented/deferred — SATISFIED (cycle 1 ACCEPT, plus
  PR-2.7 ports 6 cutover-lost shapes + PR-2.8 closes virtual-column gap)
- (c) Pushdown engagement via display_tree / POLARS_VERBOSE — PARTIAL (structural
  assertions on unit tests; e2e engagement-verification via POLARS_VORTEX_VERIFY_PUSHDOWN
  remains deferred)
- (d) Build/test suite green — SATISFIED locally; CI verification pending push
- (e) 10 cycle-3 should-fix carry-forward items — SATISFIED in PR-2.0
- (f) Cutover-lost shapes push down — SATISFIED in PR-2.7
- (g) Virtual-column scans push down file-col part — SATISFIED in PR-2.8

Ready for cumulative Phase 2 cycle-2 4-vote phase-end gauntlet (phase-4 preset:
spec + correctness + maint + arch lenses) on the cumulative Phase 2 diff
(vortex-integration-phase-1..HEAD).

(cherry picked from commit 78732e5)
Phase 2 cycle 2 4-vote phase-end gauntlet (phase-4 preset) returned reject
with 3 must-fix (1 spec + 2 maint) + 4 should-fix (1 each across spec /
correctness / maint / arch lenses).

Must-fix items addressed:

1. (spec) Two Deferred work entries 'PR-2.6 cutover-lost pushdown shapes'
   and 'Two additional §5-row shapes not yet in the AExpr-direct convertor'
   were not strikethrough/RESOLVED per PR-2.7 plan-row acceptance criterion
   (f). Both entries now marked with strikethrough + 'RESOLVED in PR-2.7
   (commits: ...)' annotation, mirroring the PR-2.8 entry's format.

2. (maint) test_scan_with_hive_partitioning_and_filter docstring described
   the all-or-nothing virtual-column refuse guard at lower_ir.rs:780-791
   (deleted in PR-2.8) and the legacy polars_to_vortex_predicate fallback
   (deleted in PR-2.6). Rewrote docstring to describe the current PR-2.8
   per-minterm split mechanism via aexpr_file_minterms_to_vortex_expression.

3. (maint) test_scan_with_row_index_and_filter docstring similarly stale —
   described the 'cycle-2 hive-only guard extended to row_index' refuse path
   that no longer exists. Rewrote to describe the per-minterm split.

Should-fix items addressed:

4. (maint SF1) polars-vortex/src/read/predicate.rs module header listed 4
   shapes as 'Deferred work items the AExpr-direct convertor does NOT yet
   handle' — but PR-2.7 ported all 4 plus 2 more. Updated the 'Coverage
   parity' section to enumerate the 22 shapes now supported as of PR-2.7 +
   PR-2.8, plus the residual list (non-Strict CAST, cross-kind CAST, regex
   contains, temporal extracts, Float16).

5. (maint SF2) polars-vortex/README.md inner architecture table called
   Column(name) gate 'virtual-column refuse in lower_ir' (now per-minterm
   drop in helper), listed Ternary in the refused-shapes prose
   (contradicted the outer Pushdown coverage table at line 242 which shows
   Ternary shipped), and described the lower_ir.rs guard as still
   refusing-whole-predicate. Updated the table to enumerate the 6 PR-2.7
   shapes + the PR-2.8 per-minterm-drop convention; rewrote the prose
   below the table to describe the strictly-better-than-all-or-nothing
   behavior.

6. (correctness SF) No cross-PR unit test combined a PR-2.7 shape with a
   PR-2.8 virtual_col minterm. Added
   minterms_is_between_with_virtual_col_pushes_is_between_only — exercises
   the leaf-walk-through-Function-args invariant
   (aexpr_to_leaf_names_iter must descend through Function args via
   children_rev to find file vs virtual leaves) and asserts the kept-side
   bounds (10, 20) are present while the dropped-side literal (9999) is
   absent. Defends against a regression in leaf-walking.

7. (arch SF) Added 8 single-line `// SCHEMA-GATE: <kind>` markers at each
   pattern-match arm that directly constructs a Vortex builder (BinaryExpr,
   is_between, is_in, Function::Boolean (unary), StructField, StringExpr,
   Cast, Ternary). Greppable via `rg '// SCHEMA-GATE:'`. Module-level
   docstring extended with a 'SCHEMA-GATE convention' section listing all
   current gates — codifies the recurring discipline so a fresh arm-adder
   either replicates the gate or explicitly justifies why no gate is
   needed.

Should-fix items deferred (per spec reviewer's Option (b) recommendation):

8. (spec SF) is_in arena-level structural unit test deferred — adding the
   test requires constructing a LiteralValue::Scalar(Scalar::new(
   DataType::List(Box::new(Int32)), AnyValue::List(series))) haystack node,
   which needs spinning up polars-core Series + AnyValue::List shell
   (~20-30 LoC of test setup vs ~10 LoC for other shapes). e2e coverage at
   test_scan_with_is_in_filter protects user-visible correctness.
   Documented as a Deferred work entry tracking this defense-in-depth gap.
   Also fixed the misleading comment at vortex_convertor.rs:2046-2058 that
   claimed is_between also lacked unit coverage (it does have
   shape_is_between_structural).

Plan updates:
- 2 Deferred entries strikethrough/RESOLVED in PR-2.7
- 1 new Deferred entry: 'is_in arena-level structural unit test'

Verification:
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
  cargo test -p polars-plan --lib --features vortex,is_between,is_in,regex,strings,dtype-struct vortex_convertor
  → 94 passed (was 93; +1 cross-PR test)
  python3 -c "import ast; ast.parse(open(...test_vortex.py))" → OK
(cherry picked from commit 0a1475b)
Phase 2 cycle 3 4-vote phase-end gauntlet (phase-4 preset: spec +
correctness + maint + arch lenses) ACCEPTED with 0 must-fix, 0
should-fix, 0 nit across all 4 lenses.

Cycle 2 reject had 3 must-fix (1 spec Deferred-entry markup + 2 maint
stale e2e docstrings) + 4 should-fix; the sweep commit 0a1475b
applied all 7 (3 MF + 4 SF) in a single 5-file diff:
- 2 Deferred entries strikethrough/RESOLVED in PR-2.7 with commit refs
- 2 stale e2e test docstrings rewritten to describe PR-2.8 mechanism
- polars-vortex/src/read/predicate.rs module header refreshed
- polars-vortex/README.md inner architecture table refreshed
- New cross-PR unit test (PR-2.7 shape × PR-2.8 minterm helper)
- 8 SCHEMA-GATE markers + module-doc convention section codifying
  the recurring schema-gate discipline across PR-2.2/.3/.4/.7
- 1 new Deferred entry: is_in arena-level structural test
  (per spec reviewer Option (b) — e2e coverage protects user-visible
  correctness; defense-in-depth gap tracked for Phase 3)

Current State updates:
- status: phase-boundary (was: same, but now ready for Step 3.4 gate)
- current_phase: Phase 2 COMPLETE — awaiting Step 3.4 user gate
- phase_end_cycle: 3, phase_end_reject_cycles: 1
- last_phase_end_verdict: accept
- last_commit: 0a1475b
- subagent_invocations_total: 60 (+9 since cycle 2: 4 cycle 2
  reviewers + 4 cycle 3 reviewers + 1 cycle 2 implicit synthesis-skip)

Phase 2 exit criteria status (all SATISFIED):
- (a) 4-vote phase-end accepts cycle 3 (cycle 1 + 3 both accepted;
  cycle 2 rejected and was resolved in 0a1475b)
- (b) §5 pushdown coverage rows implemented or documented as deferred
- (c) Pushdown engagement via structural assertion at unit-test layer
  (e2e engagement-verification via POLARS_VORTEX_VERIFY_PUSHDOWN
  remains in Deferred work)
- (d) Build/test suite still green (94 unit + 13 e2e tests)
- (e) PR-2.0 cycle-3 should-fix carry-forward items resolved
- (f) Cutover-lost shapes (PR-2.7) push down (6 shapes shipped)
- (g) Virtual-column-partitioned scans (PR-2.8) push down file-col part

Phase 2 cumulative: 78 commits on vortex-integration branch since
the vortex-integration-phase-1 base (Phase 1 stack tip). Phase 2
amend (PR-2.7 + PR-2.8 + cycle 2 sweep) totals 30 commits.

Awaiting Step 3.4 user gate before proceeding to Phase 3 (PR-3.1
file-stats + PR-3.2 multi-file + PR-3.3 nested-type coverage).

(cherry picked from commit aa701b9)
…ase 2 cycle 3

The cycle 3 sweep commit 0a1475b introduced new docstrings (3 e2e
test rewrites, 2 PR-2.8 e2e tests) + module-level convention doc + 8
SCHEMA-GATE markers + 1 cross-PR test that triggered:

1. ruff: 17 errors (14 D205 'blank line required between summary and
   description' + 2 W505 'doc line too long' + 1 D301 'use r""" for
   backslashes' + 1 D209 + 1 RUF100). Of these, 3 were auto-fixable
   via ruff --fix; 11 required manual docstring restructuring
   (collapse multi-line summary to single line + blank-line separator
   + add r""" prefix for `\` in docstring at the wildcard-negative-path
   test). Many were pre-existing in test_vortex.py from earlier PRs;
   accumulated unfixed across prior cycles.

2. rustfmt: import-order normalization in vortex_convertor.rs after
   the new use crate::plans::aexpr::MintermIter line (cycle 1
   PR-2.8) shifted neighboring import ordering. cargo fmt applied
   cleanly.

Verification:
  uv tool run ruff check py-polars/ → All checks passed
  cargo fmt --check → clean
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
  cargo test -p polars-plan --lib --features ... vortex_convertor → 94 passed
(cherry picked from commit 6a8dc83)
…e 1++

2026-05-19 PR-stack restructure: this branch (vortex-integration) is now
**Phase 2 only** — AExpr-direct convertor + Option B→A cutover (PR-2.1
through PR-2.8 amend, 60 commits). PR-2.0 cleanups and Phase 3 work
(file-stats, multi-file tests) moved to the Phase 1++ branch
(vortex-integration-phase-1).

User-facing PR split:
- PR #2 (Phase 1++): complete robust Vortex integration foundation
  (read/write + legacy pushdown + file-stats + multi-file tests +
  PR-2.0 cleanups + bench harness + remaining Phase 1++ work yet to
  ship: nested-type + small-int + engagement infra + extended benches)
- PR #1 (Phase 2): pure perf follow-on — replaces legacy
  SpecializedColumnPredicate pushdown with AExpr-direct convertor;
  measured speedup visible against the Phase 1++ baseline benches

Backup at backup-vortex-integration-pre-restack preserves the
pre-restack tip (7b0f670) for audit.

Phase 2 cycle-3 4-vote phase-end ACCEPTED verdict is preserved through
the rebase. PR #1 base needs to be retargeted to vortex-integration-phase-1
(currently points at the old tip). Plan rewrite + full Implementation
status update deferred to next session.
@lwwmanning
Copy link
Copy Markdown
Member Author

Closed and reopened as #4 with branch renamed to vortex-integration-phase-2 so the branch-name pair matches Phase 1++'s vortex-integration-phase-1. Same content, same SHA, no other changes.

@lwwmanning lwwmanning closed this May 19, 2026
@lwwmanning lwwmanning deleted the vortex-integration branch May 19, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request rust

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant