Skip to content

feat(rust): Vortex Phase 2 — AExpr-direct filter pushdown (stacked on #2)#1

Closed
lwwmanning wants to merge 62 commits into
vortex-integration-phase-1from
vortex-integration
Closed

feat(rust): Vortex Phase 2 — AExpr-direct filter pushdown (stacked on #2)#1
lwwmanning wants to merge 62 commits into
vortex-integration-phase-1from
vortex-integration

Conversation

@lwwmanning
Copy link
Copy Markdown
Member

@lwwmanning lwwmanning commented May 14, 2026

Replaces the SpecializedColumnPredicate-based filter pushdown shipped in #2 with an AExpr-direct convertor that walks Polars' Arena<AExpr> at IR-build time in lower_ir.rs::FileScanIR::Vortex, where the arena is still in scope. The convertor handles a strictly broader shape set: scalar comparisons, boolean combinators, null checks, struct field access, numeric col + 1, same-kind CAST under Strict options, is_between / is_in / str.starts_with / str.ends_with / str.contains(literal=True) / Ternary, and arbitrary multi-column predicates composed from the above. Schema gates on every direct-builder arm protect the always-SAFE-fallback contract — Vortex's checked_add, cast, comparison, and struct-field builders each error at scan-time on certain inputs (cross-kind CAST, cross-PType arithmetic, missing struct fields), and the gates refuse pushdown so the residual filter handles those cases. When pushdown is refused for any shape, the multi-scan layer reapplies the original predicate post-decode via PARTIAL_FILTER — correctness is invariant, the visible effect is wall-clock.

Stacked on #2; review that PR first. The vortex_scan/filter_arithmetic Criterion bench in #2 is the load-bearing measurement: col + 1 == N falls back to "decode everything then filter" on #2 alone and pushes down to Vortex's zone pruner here. Capture a phase-1 baseline on #2's tip with cargo bench -p polars --features vortex,cloud,parquet,dtype-full,strings --bench io_vortex -- --save-baseline phase-1, then re-run with --baseline phase-1 on this branch to see the delta.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.4950 MB.

1 similar comment
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.4950 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5098 MB.

4 similar comments
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5098 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5098 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5098 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5098 MB.

@tl-dr-review tl-dr-review Bot added the tldr label May 16, 2026
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.3440 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5096 MB.

1 similar comment
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5096 MB.

@gatesn gatesn removed the tldr label May 16, 2026
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5094 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5093 MB.

1 similar comment
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5093 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5095 MB.

4 similar comments
@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5095 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5095 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5095 MB.

@github-actions
Copy link
Copy Markdown

The uncompressed lib size after this PR is 59.5095 MB.

@tl-dr-review tl-dr-review Bot added the tldr label May 17, 2026
lwwmanning added a commit that referenced this pull request May 18, 2026
…O + EqValidity None tests

Cycle-1 gauntlet (preset=pr-2, fresh+correctness) accepted PR-2.1 with
no must-fix but flagged 5 should-fix items. This commit addresses 2 of
them; the others (3 deferred) are tracked in the Deferred work section
in the follow-up plan commit.

1. **Bitwise-vs-logical operator caveat** (cycle-1 correctness should-fix
   #1+#2; cycle-1 fresh should-fix F-003): Polars `Operator::And`/`Or`
   and `IRBooleanFunction::Not` are bitwise-OR-logical (work on integer
   columns too — see boolean.rs:49 `// Also bitwise negate`). Vortex's
   `and`/`or`/`not` are boolean-only. For embedded integer-bitwise
   sub-trees like `(col_int & 1) > 0`, the convertor would emit
   semantically wrong Vortex expressions. Polars's own column_expr.rs:
   245-247 guards with `dtype.is_bool()`.

   Added a load-bearing TODO comment at the doc-block of
   `aexpr_to_vortex_expression` explaining the constraint + naming
   PR-2.2's wire-up site as the schema-gated mitigation point. Option (a)
   at the call site (skip pushdown if any And/Or/Not operand is non-bool)
   is cheaper than threading `&Schema` through the convertor.

2. **Missing None-returning tests for `EqValidity`/`NotEqValidity`**
   (cycle-1 fresh should-fix F-002): the explicit `return None` arms at
   the match for these operators had no test coverage. Added two ~10-LoC
   tests parallel to `unsupported_xor_returns_none`. Test count 21 → 23.

Verified: `cargo test -p polars-plan --features vortex,is_between,is_in
vortex_convertor` → 23 passed, 0 failed.
@lwwmanning lwwmanning force-pushed the vortex-integration branch from 3b7260d to d89b7a5 Compare May 18, 2026 01:47
@lwwmanning lwwmanning changed the base branch from main to vortex-integration-phase-1 May 18, 2026 01:49
lwwmanning added 26 commits May 19, 2026 11:45
Consolidates the amend-completion + Step 2.3-step-2 awaiting-review transitions
for PR-2.7 (implementation commits 065df50 + 0e2c26f landed ahead of this
plan update — minor ordering drift from the canonical inner-loop sequence;
substance unchanged).

Current State now reflects:
- status: awaiting-review
- current_pr: PR-2.7, pr_index: 8 (post-amend, PR-2.7 is the 8th in Phase 2)
- planning_sub_flow: null (amend sub-flow closed)
- last_phase_end_verdict: null (reset per amend semantics; next phase-end
  review fires as cycle 2 of Phase 2 after PR-2.7 + PR-2.8 complete)
- phase_entry_sha unchanged (93643dd) so cumulative-phase reviews still
  span original PR-2.0..PR-2.6 + amend PR-2.7 + PR-2.8

(cherry picked from commit 8639dd5)
Both reviewers (fresh + correctness) REJECT with high confidence. Three must-fix correctness gates of the same bug class as PR-2.3 cycle-1 CAST cross-kind and PR-2.4 cycle-2 comparison pairwise-PType: (1) is_between arm lacks pairwise-PType gate (vortex_convertor.rs:319-340); (2) Ternary arm lacks THEN/ELSE pairwise-dtype gate (vortex_convertor.rs:563-572); (3) StringExpr arm lacks Utf8 input gate (vortex_convertor.rs:471-510). All three violate always-SAFE-fallback contract. Severity dedupe on Ternary disagreement = HIGHEST per spec (fresh: must-fix, correctness: should-fix with reduced-reach rationale). 6 should-fix + 2 nit also flagged. Entering Step 2.4 fix-application loop.

<details><summary>gauntlet Synthesizer Output (schema_version: 1)</summary>

```json
{
  "schema_version": 1,
  "preset": "pr-2",
  "lenses_used": ["fresh", "correctness"],
  "review_count": 2,
  "unified_findings": [
    {"severity": "must-fix", "kind": "bug", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:319-340", "description": "is_between arm lacks pairwise-PType gate between col and bounds. The arm calls Vortex's gt/gt_eq/lt/lt_eq builders DIRECTLY (not via AExpr::BinaryExpr recursion), so the BinaryExpr arm's pairwise-PType gate is NEVER invoked. The comment at L312-318 falsely claims 'schema-gated only transitively' via the BinaryExpr arm. Cross-PType bounds (e.g., Int32 col with Int64 literal bounds when TYPE_COERCION is off) will vortex_bail! at scan-time in Vortex's Binary::return_dtype (vortex-array/src/scalar_fn/fns/binary/mod.rs:130-136) with 'Cannot compare different DTypes', violating the always-SAFE-fallback contract. Same bug class as PR-2.3 cycle-1 CAST cross-kind and PR-2.4 cycle-2 comparison pairwise-PType must-fixes. Highly reachable: column dtypes are diverse and Rust literal types easily mismatch.", "recommended_fix": "Add pairwise gate BEFORE building comparisons: `let s = schema?; let col_dt = resolve_inner_dtype(input[0].node(), arena, s)?; let lo_dt = resolve_inner_dtype(input[1].node(), arena, s)?; let hi_dt = resolve_inner_dtype(input[2].node(), arena, s)?; if col_dt != lo_dt || col_dt != hi_dt { return None; }`. Mirror the precedent in Plus / comparison gates for the type_coercion-off path. Add a unit test `shape_is_between_cross_ptype_returns_none` (col=Int32, lo/hi=Int64). After the gate lands, rewrite or delete the misleading transitivity comment.", "found_by": ["fresh", "correctness"]},
    {"severity": "must-fix", "kind": "bug", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:563-572", "description": "Ternary arm lacks THEN/ELSE pairwise-dtype gate. Comment at L556-562 falsely claims 'Vortex's CaseWhen is type-checked at builder time' — inspection of vortex-array/src/expr/expression.rs:45-62 shows try_new only validates arity; the dtype unification fires in CaseWhen::return_dtype at scan/pruning time (vortex-array/src/scalar_fn/fns/case_when.rs:185-191) with 'CaseWhen THEN and ELSE dtypes must match'. Cross-dtype truthy/falsy (e.g., nested Ternary with col_int32 vs col_int64) recurse independently and bypass the gate. Same bug class as PR-2.2 cycle-1 Plus must-fix; violates always-SAFE-fallback contract. Practical reach reduced for a Ternary at the filter root (Polars validates it must be Boolean-typed), but nested Ternary inside a complex predicate when TYPE_COERCION is off remains reachable.", "recommended_fix": "Mirror Plus arm gate BEFORE recursing: `let s = schema?; let then_dt = resolve_inner_dtype(*truthy, arena, s)?; let else_dt = resolve_inner_dtype(*falsy, arena, s)?; if then_dt != else_dt { return None; }`. resolve_inner_dtype currently has no Ternary arm — recursion through it returns None which is the safe fallback. Fix the false comment to accurately describe the explicit gate. Add unit test `shape_ternary_cross_dtype_returns_none`.", "found_by": ["fresh", "correctness"]},
    {"severity": "must-fix", "kind": "bug", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:471-510", "description": "StringExpr arm lacks Utf8 input gate on input[0]. Vortex's Like.return_dtype vortex_bail!s at scan-time if input dtype isn't Utf8 (vortex-array/src/scalar_fn/fns/like/mod.rs:124-132). If input[0] is non-String (e.g., a struct field access producing Int32, or a Cast whose source isn't String), pushdown emits a Vortex `like(<Int32-typed-expr>, <utf8-pattern>)` that fails at scan-time. Violates always-SAFE-fallback contract — same class as PR-2.3 CAST gate.", "recommended_fix": "Add Utf8 gate: `let s = schema?; let col_dt = resolve_inner_dtype(input[0].node(), arena, s)?; if col_dt != DataType::String { return None; }` BEFORE building the `like` call. Add unit test `shape_starts_with_on_int_column_returns_none`.", "found_by": ["fresh"]},
    {"severity": "should-fix", "kind": "missing-acceptance", "file_line": "py-polars/tests/unit/io/test_vortex.py:330-489", "description": "Acceptance criterion (c) NOT MET: no test verifies pushdown actually engages. All 7 e2e tests assert only post-decode correctness, which the residual-reapply preserves regardless. A regression where the convertor returns None for all 6 new shapes would silently fall back to residual and still pass every test.", "recommended_fix": "Add at least one engagement assertion per shape — either inspect `pl.scan_vortex(...).filter(...).explain()` for the pushed Vortex expression, use display_tree() to verify the pushdown path, or POLARS_VERBOSE + caplog. Alternative: Rust integration test mirroring shape_plus_arithmetic_structural's structural-assertion discipline.", "found_by": ["fresh", "correctness"]},
    {"severity": "should-fix", "kind": "coverage", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:1889-1947", "description": "is_in arm has ZERO unit tests. Test-module comment claims 'covered by e2e Python tests; arena-level Series literal construction is awkward' but the e2e tests don't cover: (a) nulls_equal=true + haystack-contains-null (regression of this gate would silently push a narrower filter); (b) per-scalar conversion failure; (c) empty haystack. is_between IS unit-tested, contradicting the 'awkward' rationale.", "recommended_fix": "Add unit tests: shape_is_in_positive (using Series::from), shape_is_in_nulls_equal_with_null_returns_none, shape_is_in_empty_haystack_returns_none, shape_is_in_per_scalar_conversion_failure_returns_none.", "found_by": ["fresh", "correctness"]},
    {"severity": "should-fix", "kind": "coverage", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:1898-2089", "description": "Coverage gaps that would catch the must-fix bugs: (a) is_between cross-PType bounds; (b) Ternary cross-dtype; (c) StringExpr on non-String column; (d) is_between Right/None ClosedInterval variants (only Both/Left covered both at unit and e2e level); (e) bytes_to_like_literal wildcard test only covers '%', missing '_' and '\\\\'.", "recommended_fix": "Add tests alongside the must-fix gate work: shape_is_between_cross_ptype_returns_none, shape_ternary_cross_dtype_returns_none, shape_starts_with_on_int_column_returns_none, shape_is_between_right_inclusive, shape_is_between_none_inclusive, bytes_to_like_literal arm tests for '_' and '\\\\'.", "found_by": ["fresh", "correctness"]},
    {"severity": "should-fix", "kind": "doc-quality", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:27-64", "description": "Module-level doc-comment stale post-PR-2.7. L27 still says '(PR-2.1 foundation + PR-2.2 + PR-2.3 + PR-2.4 extensions)'. L29 still says '16 shapes' — should be 22. Shape table (L33-50) missing 6 new shapes. 'What this module does NOT cover yet' paragraph (L52-64) explicitly lists 'Ternary' (L61) — now covered.", "recommended_fix": "Update doc to include PR-2.7 shapes in the table, bump '16 shapes' to '22 shapes', remove 'Ternary' from the not-covered list, remove 'is_between'/'is_in'/'str.starts_with'/'str.ends_with' caveats.", "found_by": ["fresh"]},
    {"severity": "should-fix", "kind": "convention", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:311-318", "description": "Factually wrong comment: 'Schema-gated only transitively: the recursive aexpr_to_vortex_expression calls inherit the existing comparison pairwise-PType gate (BinaryExpr arm)'. False — the arm builds gt/gt_eq DIRECTLY via Vortex builders, never re-entering the AExpr::BinaryExpr arm. Misleads readers and is design root cause of must-fix #1.", "recommended_fix": "After adding the explicit pairwise gate, rewrite the comment to describe the explicit gate (not a transitive one). Or delete the misleading claim.", "found_by": ["fresh", "correctness"]},
    {"severity": "should-fix", "kind": "convention", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:556-562", "description": "Comment falsely claims 'Vortex's CaseWhen is type-checked at builder time' AND 'recursion's existing CAST/Plus pairwise-PType gates protect against mismatched arms'. Both FALSE: case_when() builder only validates arity; inner Plus/CAST gates protect their own subtrees, not Ternary truthy-vs-falsy mismatch. Root cause of must-fix #2.", "recommended_fix": "Rewrite to describe the added gate. Cite case_when.rs:185-191 as the bail-at-scan-time site.", "found_by": ["fresh", "correctness"]},
    {"severity": "nit", "kind": "doc-quality", "file_line": "crates/polars-vortex/README.md:240", "description": "is_in README entry mentions only the nulls_equal=true+nulls refuse path; doesn't mention the per-scalar conversion refuse (e.g., Duration scalars in haystack fail polars_scalar_to_vortex).", "recommended_fix": "Append 'or any haystack scalar fails Polars→Vortex conversion' to the is_in row's path-description cell.", "found_by": ["fresh"]},
    {"severity": "nit", "kind": "encapsulation", "file_line": "crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs:855-875", "description": "bytes_to_like_literal is private; only '%' branch tested indirectly. Other branches ('_', '\\\\', invalid UTF-8) are dead from a test standpoint.", "recommended_fix": "Either make pub(super) for direct unit testing, or add three more arm-level negative tests using each unsafe byte: '_', '\\\\', and a needle that's not valid UTF-8.", "found_by": ["fresh"]}
  ],
  "disagreements": [
    {"topic": "Ternary cross-dtype gate severity (must-fix vs should-fix)", "positions": [{"lens": "fresh", "position": "must-fix — same bug class as PR-2.3 cycle-1 CAST cross-kind and PR-2.2 cycle-1 Plus must-fixes. Violates always-SAFE-fallback contract. Should be held to the same bar as the other pairwise-dtype gates."}, {"lens": "correctness", "position": "should-fix — same bug class but with reduced practical reach. A Ternary at the filter root must be Boolean-typed (Polars validates) so the cross-dtype hazard is gated to nested Ternary inside a complex predicate when TYPE_COERCION is off."}], "synthesizer_call": "Hold at must-fix per spec rule (severity dedupe = HIGHEST) and per the existing precedent pattern (Plus/comparison gates were must-fix even though TYPE_COERCION often handles cross-PType in practice). Reduced reach is a mitigation argument, not a category change. Correctness's reachability rationale captured for implementer awareness."}
  ],
  "dropped_re_flags": [],
  "executive_summary": "Both reviewers independently converged on REJECT with high confidence. Three must-fix correctness gates of the same bug class as PR-2.3 cycle-1 CAST cross-kind and PR-2.4 cycle-2 comparison pairwise-PType: (1) is_between arm lacks pairwise-PType gate; (2) Ternary arm lacks THEN/ELSE pairwise-dtype gate; (3) StringExpr arm lacks Utf8 input gate. All three violate the always-SAFE-fallback contract that PR-2.6's Option B->A cutover rests on. Fixes are 4-6 lines each, mirroring the Plus arm gate precedent. Two of the three are accompanied by misleading comments that mask the design errors and need rewriting alongside the gate fixes. Severity disagreement on Ternary resolved per spec to HIGHEST = must-fix. Coverage and engagement-assertion gaps reinforce the must-fix work: acceptance criterion (c) is NOT met; is_in arm has ZERO unit tests despite the 'awkward' rationale; is_between Right/None ClosedInterval variants are unexercised; cross-PType refusal tests for the three must-fix gates are missing. Verified design strengths: bytes_to_like_literal escape logic preserves legacy parity; ClosedInterval mapping correct; match arm order correct; cfg-gating consistent; is_in's haystack-null refuse decision sound; multi-byte UTF-8 needles handled cleanly. Total estimated work: 12-25 LoC + ~10 unit tests + doc updates.",
  "overall": "reject",
  "must_fix_count": 3,
  "should_fix_count": 6,
  "nit_count": 2,
  "review_cycles_this_invocation": 1
}

```

</details>

(cherry picked from commit a094497)
…ype gate (review finding)

The is_between arm constructs Vortex gt/gt_eq/lt/lt_eq builders DIRECTLY rather
than re-entering AExpr::BinaryExpr, so the BinaryExpr arm's pairwise-PType gate
would NEVER fire transitively. Cross-PType bounds (Int32 col with Int64 literal
bounds when TYPE_COERCION is off) would vortex_bail! at scan-time in Vortex's
Binary::return_dtype with 'Cannot compare different DTypes', violating the
always-SAFE-fallback contract. Same bug class as PR-2.3 cycle-1 CAST cross-kind
and PR-2.4 cycle-2 comparison pairwise-PType must-fixes.

Add an explicit pre-recursion pairwise-PType gate (4 lines) mirroring the
BinaryExpr arm's pattern. Rewrite the misleading 'Schema-gated only
transitively' comment to describe the explicit gate. Add 4 unit tests
(shape_is_between_right_inclusive, shape_is_between_none_exclusive,
shape_is_between_cross_ptype_returns_none, shape_is_between_without_schema_returns_none)
which together close the cycle-1 should-fix #6d ClosedInterval coverage gap
AND the cycle-1 should-fix #6a cross-PType test gap.

(cherry picked from commit 9098452)
…wise-dtype gate (review finding)

The Ternary arm fed truthy/falsy into Vortex's case_when builder without
verifying they share a dtype. case_when's try_new builder only validates arity
(vortex-array/src/expr/expression.rs:45-62); the dtype unification check fires
later in CaseWhen::return_dtype (vortex-array/src/scalar_fn/fns/case_when.rs:185-191)
which vortex_bail!s with 'CaseWhen THEN and ELSE dtypes must match' at
scan/pruning time. Cross-dtype truthy/falsy (nested Ternary with col_int32 vs
col_int64 when TYPE_COERCION is off) bypassed the inner arms' CAST/Plus gates,
violating the always-SAFE-fallback contract. Same bug class as PR-2.2 cycle-1
Plus must-fix.

Reachability: a Ternary at the filter root must be Boolean-typed (Polars
validates), so the cross-dtype hazard is gated to nested Ternary inside a
complex predicate. Held to must-fix per existing precedent (Plus / comparison
gates) — always-SAFE-fallback contract is binary.

Add an explicit pre-recursion THEN/ELSE pairwise-dtype gate (4 lines) mirroring
the Plus arm pattern. Rewrite the misleading 'CaseWhen is type-checked at
builder time' + 'CAST/Plus gates protect mismatched arms' comments to
accurately describe the explicit gate. Add 2 unit tests
(shape_ternary_cross_dtype_returns_none, shape_ternary_without_schema_returns_none).

(cherry picked from commit 85ca567)
…gate (review finding)

Vortex's Like kernel (vortex-array/src/scalar_fn/fns/like/mod.rs:124-132
return_dtype) vortex_bail!s with 'Cannot apply like to non-Utf8 input' at
scan-time if input[0] isn't Utf8. The StringExpr arm fed input[0] directly
into the recursive convertor without verifying the column's dtype, so a
non-String input (struct field access producing Int32, Cast whose source
isn't String) would silently push a like(<non-utf8>, <utf8-pattern>) that
fails at scan-time, violating the always-SAFE-fallback contract. Same bug
class as PR-2.3 cycle-1 CAST cross-kind must-fix.

Add an explicit Utf8 input gate (3 lines) BEFORE any needle extraction or
pattern construction work. Update existing 4 StringExpr unit tests
(shape_starts_with_positive / _wildcard_in_needle / shape_ends_with_positive
/ shape_contains_literal_{true,false}) to pass schema (the gate now consults
schema?). Add 4 new unit tests: shape_starts_with_on_int_column_returns_none
(must-fix #3 regression), shape_starts_with_without_schema_returns_none
(no-schema refuse), shape_starts_with_underscore_in_needle_returns_none and
shape_starts_with_backslash_in_needle_returns_none (close cycle-1 should-fix
#6e and nit pola-rs#11 coverage gap).

Also adds schema_s_string() and schema_s_string_i_int32() test helpers.

(cherry picked from commit b2b9659)
… Utf8 gate) — all 3 must-fix done; should-fix sweep next

(cherry picked from commit 8397a3a)
- vortex_convertor.rs module-level doc-comment refresh (should-fix pola-rs#7):
  bumps shape count 16 -> 22, adds 6 PR-2.7 shapes to the table with their
  Vortex builder + gate notes, removes Ternary from the 'not covered yet'
  list, updates PR-2.5 reference to reflect the Deferred-work slip.

- 5 structural-assertion unit tests for the new shapes (should-fix #4 —
  acceptance criterion (c)): shape_is_between_structural,
  shape_starts_with_structural, shape_ends_with_structural,
  shape_contains_literal_true_structural, shape_ternary_structural. Each
  asserts the produced Vortex Expression's Display contains the expected
  builder anchor (and/like/case_when) plus the column/literal substrings.
  Catches paste-swap bugs the prior .is_some()-only tests would miss
  (mirrors the shape_plus_arithmetic_structural discipline from PR-2.2
  cycle-1). is_in structural omitted — its Series-literal arena
  construction needs SpecialEq<Series> infrastructure not yet wired into
  the test helpers; e2e Python test covers correctness.

- README.md is_in entry refinement (cycle-1 nit pola-rs#10): document the
  per-scalar-conversion refuse path (Duration scalars in haystack fail
  polars_scalar_to_vortex) alongside the existing nulls_equal+nulls
  refuse-path note.

(cherry picked from commit 9473106)
…Ternary arm (review finding)

The cycle-1 must-fix #2 fix added a THEN/ELSE pairwise-dtype gate to the
Ternary convertor arm, calling resolve_inner_dtype(*truthy)? to verify
match. But resolve_inner_dtype had no Ternary arm — fell to the catchall
_=>None — so any NESTED Ternary truthy/falsy would unconditionally return
None, ?-propagating to refuse pushdown even when inner dtypes match.
Fail-closed-safe but a real coverage regression vs the pre-gate baseline.

Add an AExpr::Ternary arm to resolve_inner_dtype that recurses on
truthy+falsy and returns Some(t) when t==e, mirroring the Plus arm's
self-contained verification pattern at L726-730. Walk truthy + falsy,
require equality (Vortex's case_when builder requires THEN and ELSE share
a return_dtype), unresolvable shapes fall through to None.

Add a regression test shape_ternary_nested_matching_dtypes_pushes that
constructs a 2-level nested Ternary (outer.truthy = inner Ternary) and
asserts pushdown engages. Before this fix the test would have asserted
.is_some() but produced None.

(cherry picked from commit fa89c9b)
…/is_in imports (review finding)

PR-2.7 added 5 new imports (AnyValue, VortexScalar, like, or_collect, and
the fn bytes_to_like_literal) that are used ONLY inside arms gated on
strings or is_in, but the imports themselves were ungated. Under
`cargo build -p polars-plan --features vortex --no-default-features`
this produced 5 fresh unused-import warnings; the umbrella CI build uses
--all-features so the warnings didn't fail loud.

Gate each import + the fn definition matching the existing pattern
(e.g., the cycle-1 PR-2.7 already gated IRStringFunction on
`#[cfg(feature = "strings")]` — same idiom for the underlying types
the StringExpr arm uses).

Split AnyValue from DataType in the prelude line (DataType is used by
every arm — must stay ungated; AnyValue is used only in the StringExpr
needle extraction). Split like/or_collect out of the multi-import on the
vortex::expr line and gate each on strings/is_in respectively.

Verified clean: cargo check -p polars-plan --features vortex (no
warnings on vortex_convertor.rs), with all PR-2.7 features
(--features vortex,is_between,is_in,regex,strings,dtype-struct, same),
and umbrella cargo check -p polars --features vortex,cloud,parquet,dtype-full.
88 unit tests pass.

(cherry picked from commit d457a13)
…hen structural tests (review finding)

Cycle-1 added 5 structural-assertion tests (shape_is_between_structural,
shape_starts_with_structural, shape_ends_with_structural,
shape_contains_literal_true_structural, shape_ternary_structural) to
close cycle-1 acceptance criterion (c) but cycle-2 reviewers flagged
three with loose assertions that don't catch specific paste-swap
classes:

#3 shape_is_between_structural: vacuous negative anchor
'!s.contains("eq(") && !s.contains("==")'. Vortex's binary Display
uses single-char '=' ('(x = 10)'), never 'eq(' or '=='; both substrings
are absent in correct AND eq-paste-swap outputs. Replace with
's.contains(">") && s.contains("<")' (an eq-paste-swap '(x = 10)
AND (x = 20)' has neither '>' nor '<').

#4 shape_starts_with_structural + shape_ends_with_structural: don't
catch starts↔ends paste-swap. Both produce 'like'+needle+'%' Display
strings differing only in '%' position. Strengthen with joint substring
(s.contains("prefix%")) + negative anchor (!s.contains("%prefix")).
Mirror for ends_with.

pola-rs#5 shape_ternary_structural: doesn't catch truthy↔falsy paste-swap.
case_when(cond, b, a) still satisfies s.contains('a') && s.contains('b')
because both column refs remain present. Switch truthy/falsy from cols
'a' / 'b' to distinct literals 777 / 888 + assert positional ordering
(t_pos < f_pos in the Display string).

All 88 tests pass (5 structural tests strengthened; shape_plus_arithmetic_structural
unchanged).

(cherry picked from commit a9918a2)
…+ nit pola-rs#7 — README gate notes + is_in empty-haystack comment (review finding)

- README pushdown table: add the three schema-gate notes that the cycle-1
  fix-commits added to the convertor module-doc table but didn't propagate
  to README. is_between → '(schema-gated pairwise-PType)'. starts_with /
  ends_with / contains → '(schema-gated Utf8 input)'. Ternary → '(schema-gated
  THEN/ELSE pairwise-dtype)'. Closes the H1 cross-reference drift between
  convertor source and README documented in cycle-2 finding pola-rs#6.

- vortex_convertor.rs is_in arm: add a 6-line comment before or_collect(terms)
  acknowledging that 'or_collect(vec![])' returns None for an empty haystack
  → arm returns None. Polars's post-decode residual reapply handles the
  all-false 'is_in([])' semantic correctly, so this is a missed
  optimization (we could pre-empt with lit(false)) — not a bug. Closes
  cycle-2 finding nit pola-rs#7.

All 88 unit tests still pass.

(cherry picked from commit aebb302)
…e Deferred entry

Polars's is_in uses TotalOrd (NaN==NaN); Vortex's eq uses IEEE 754 (NaN!=NaN).
This divergence affects ALL float-comparison convertor arms (foundation
eq/not_eq/lt/lt_eq/gt/gt_eq since PR-2.1 + PR-2.7's new is_in/is_between).
The legacy SpecializedColumnPredicate::EqualOneOf path had the same
divergence; PR-2.7 just amplifies the surface.

Add a Deferred work entry capturing the divergence with two mitigation
paths: (a) refuse pushdown when any float literal is NaN (~10 LoC,
conservative); (b) accept the divergence and document as known
limitation. Not blocking — residual reapply preserves correctness.
Tracked so a future read doesn't assume float pushdown is fully isomorphic
with Polars semantics.

(cherry picked from commit 5e68b4f)
Three-cycle inner-loop gauntlet history:
- Cycle 1 REJECT (3 must-fix + 6 should-fix + 2 nit) — schema-gate gaps in
  the same bug class as PR-2.3 / PR-2.4 cycle-1 pairwise-PType must-fixes.
- Cycle 2 ACCEPT (0 must-fix + 6 should-fix + 2 nit) — gates correctly
  implemented; user picked maximum-thoroughness apply-all-polish-inline.
- Cycle 3 ACCEPT (0 must-fix + 0 should-fix + 1 nit) — polish verified
  clean; only finding is pre-existing CastOptions unused-import (PR-2.3
  era, out of PR-2.7 scope).

17 PR-work commits 065df50..71540df (feat: convertor arms; docs: e2e
tests + README; plan: amend complete + awaiting + cycle 1 atomic; fix×3 +
decrement×3 must-fix gates; docs: cycle-1 should-fix sweep; fix×3 cycle-2
should-fix; docs + plan: cycle-2 polish + NaN Deferred + cycle-2 polish
landed). 25 new convertor unit tests + 7 new e2e Python tests. 88 total
convertor tests pass.

Current State advances: status: executing, current_pr: PR-2.8, pr_index: 9,
per-PR counters reset. Phase 2 phase-end review (cycle 2) will fire after
PR-2.8 completes.

Append Implementation status entry capturing scope shipped, tests added,
3-cycle review history, confidence, deferred items (1 new: Float NaN
semantic divergence), and 3 surprises during implementation (H4 self-
reinforcement validated, H2 new-prose internal edge case validated,
prior_fix_commit_sha attention block worked as designed).

(cherry picked from commit 024341d)
…olumn per-column predicate split

The PR-2.2 cycle-1 must-fix M1 conservative refuse at lower_ir.rs:801-815
gave up on convertor pushdown ENTIRELY when ANY of hive_parts /
unified_scan_args.row_index / unified_scan_args.include_file_paths was set.
Correctness held via the multi-scan PARTIAL_FILTER reapply but the
file-column parts of a mixed predicate missed Vortex zone pruning.

PR-2.8 turns the all-or-nothing guard into a per-column split mirroring
polars-mem-engine/scan_predicate/functions::create_scan_predicate's
hive_predicate extraction:

1. Add helper aexpr_file_minterms_to_vortex_expression in
   vortex_convertor.rs (~30 LoC). Walks top-level conjuncts via
   MintermIter, filters to file-only minterms (no leaf-column in
   virtual_cols), converts each via aexpr_to_vortex_expression, and
   AND-collects. Refuses pushdown if no file-only minterms remain.
   Strictly-better than aexpr_to_vortex_expression on the whole tree:
   partial conversion is now possible (a == 1 AND unsupported_op(b)
   pushes a == 1 instead of refusing the whole thing).

2. Refactor lower_ir.rs FileScanIR::Vortex arm to build the
   virtual_cols set from hive_parts.schema().iter_names() +
   row_index.name + include_file_paths, then call the new helper
   unconditionally instead of gating on hive/row_index/file_paths
   presence.

3. Add 5 unit tests covering: all file-only AND-collects all;
   all virtual returns None; partial pushes file part only;
   top-level OR with virtual operand refuses (single minterm
   references both); unsupported subtree dropped in partial push.

4. Add e2e Python test test_scan_with_hive_and_file_col_mixed_filter
   exercising (pl.col('x') > 5) & (pl.col('year') == 2024) over a
   hive-partitioned vortex dataset. Verifies correctness; a regression
   mis-classifying a hive col as file would surface as a Vortex
   ComputeError 'missing column year'.

Resolves Deferred work entry 'Virtual-column-partitioned Vortex scans
don't benefit from AExpr convertor pushdown' (PR-2.2 cycle-1 must-fix
M1 + cycle-2 C2-001).

Verification:
  cargo test -p polars-plan --features vortex,is_between,is_in,regex,strings,dtype-struct vortex_convertor
  → 93 passed (88 prior + 5 new minterms_* tests)
  cargo check -p polars-stream --features vortex,cloud → clean
  cargo check -p polars --features vortex,cloud,parquet,dtype-full → clean
(cherry picked from commit 9d9469f)
PR-2.8 cycle 1 gauntlet (2-vote pr-2 preset: fresh + correctness)
returned accept with 0 must-fix, 3 should-fix on test rigor:

1. Three new minterms_* unit tests (all_file_only_collects_all,
   partial_pushes_file_part_only, unsupported_subtree_dropped_in_
   partial_push) asserted only .is_some(). Paste-swap test: swapping
   the bodies of any two would leave both passing — a regression
   ignoring virtual_cols entirely (pushing both file + virtual
   conjuncts to Vortex) or inverting polarity (keeping virtual,
   dropping file) would not be caught.

2. e2e coverage for partial-push: only hive-col virtual was
   exercised; row_index and include_file_paths virtuals went through
   the same lower_ir.rs branch but had no e2e test.

Fixes:

1. Tighten the 3 unit tests with the joint-substring + negative-anchor
   structural assertion pattern established in PR-2.7 cycle 2
   (shape_starts_with_structural, shape_ends_with_structural).
   Changed literals to unique discriminators (42 kept-side, 99 for
   the second file conjunct, 9999 for dropped virtual, 888 for dropped
   unsupported subtree). Each tightened test now formats the produced
   Vortex Expression and asserts both (a) the kept-side literal is
   present and (b) the dropped-side literal is absent.

2. Add 2 e2e tests:
   - test_scan_with_row_index_and_file_col_mixed_filter — row_index
     virtual + (x > 5) & (ri > 10) filter
   - test_scan_with_include_file_paths_and_file_col_mixed_filter —
     include_file_paths virtual + (x > 5) & src.str.contains('a')
     filter across two files

Each new e2e test verifies result correctness; a regression
mis-classifying the virtual col as a file column would surface as
a Vortex ComputeError on missing column.

Carry-forward (cycle 1 deferred / nits):
- e2e pushdown-engagement verification (.explain() / metrics
  inspection) remains tied to the POLARS_VORTEX_VERIFY_PUSHDOWN
  Deferred entry (no change in this commit).
- virtual_cols hoist outside the and_then closure at lower_ir.rs:807
  — true nit, deferred.

Verification:
  cargo test -p polars-plan --features vortex,is_between,is_in,regex,strings,dtype-struct minterms_
  → 5 passed (all five PR-2.8 minterms_* tests)
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
(cherry picked from commit f55e00b)
…criminator

PR-2.8 cycle 2 gauntlet (2-vote pr-2 preset) rejected with 1 must-fix
caught by BOTH reviewers (fresh + correctness):

  test_scan_with_include_file_paths_and_file_col_mixed_filter used
  pl.col("src").str.contains("a") as the discriminator between
  a.vortex and b.vortex. include_file_paths populates `src` with the
  FULL path (per ScanSourceRef::to_include_path_name at
  crates/polars-plan/src/dsl/scan_sources.rs, no basename
  normalization). pytest's tmp_path fixture generates a path like
  /var/folders/.../pytest-of-USER/pytest-N/
  test_scan_with_include_file_paths_and_file_col_mixed_filter0/
  {a,b}.vortex — the test name itself contains multiple 'a' chars,
  so str.contains("a") evaluates TRUE for BOTH a.vortex and b.vortex.
  The predicate degenerates to `x > 5` alone, returning 5 rows
  instead of the expected 2; assertions fail in CI.

Fix: anchor the discriminator on the basename suffix via
str.ends_with("a.vortex"). Cross-platform, robust against any
tmp_path naming, and clearly intentional. Updated docstring to note
that `src` is the FULL path. Updated final assertion to use
.endswith() (matches the new discriminator's mental model).

The remaining structural assertions on the 3 minterms_* unit tests
(literals 42 / 99 / 9999 / 888 as unique discriminators) are validated
against Vortex's display format (cycle 2 correctness reviewer
confirmed: i32 literals render as `42i32`, no operator or connector
contains digits, no false-positive risk).

Verification:
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
  cargo test -p polars-plan --features vortex,is_between,is_in,regex,strings,dtype-struct minterms_
  → 5 passed
  python3 -c "import ast; ast.parse(open('py-polars/tests/unit/io/test_vortex.py').read())" → OK
(cherry picked from commit c9fd818)
…phase-end review

PR-2.8 cycle 3 ACCEPTED by 2-vote pr-2 gauntlet (0 must-fix, 0 should-fix, 0 nit; both lenses clean).

Plan updates:
1. Current State: status → phase-boundary; current_pr → null; phase_end_cycle → 2;
   last_commit → c9fd818; subagent_invocations_this_pr → 6; review_cycles_this_pr → 3
2. Implementation status: added PR-2.8 entry with 4 commits, 3-cycle review history,
   2 lessons-learned bullets (e2e discriminator anti-pattern + paste-swap-vulnerability
   spans both unit + e2e layers)
3. Deferred work: 'Virtual-column-partitioned Vortex scans don't benefit from AExpr
   convertor pushdown' RESOLVED-in-PR-2.8 with strikethrough convention

Phase 2 amend (PR-2.7 + PR-2.8) complete. Phase 2 exit criteria status:
- (a) 4-vote phase-end review accepts cycle 2 — PENDING (next: this firing)
- (b) §5 pushdown coverage rows implemented/deferred — SATISFIED (cycle 1 ACCEPT, plus
  PR-2.7 ports 6 cutover-lost shapes + PR-2.8 closes virtual-column gap)
- (c) Pushdown engagement via display_tree / POLARS_VERBOSE — PARTIAL (structural
  assertions on unit tests; e2e engagement-verification via POLARS_VORTEX_VERIFY_PUSHDOWN
  remains deferred)
- (d) Build/test suite green — SATISFIED locally; CI verification pending push
- (e) 10 cycle-3 should-fix carry-forward items — SATISFIED in PR-2.0
- (f) Cutover-lost shapes push down — SATISFIED in PR-2.7
- (g) Virtual-column scans push down file-col part — SATISFIED in PR-2.8

Ready for cumulative Phase 2 cycle-2 4-vote phase-end gauntlet (phase-4 preset:
spec + correctness + maint + arch lenses) on the cumulative Phase 2 diff
(vortex-integration-phase-1..HEAD).

(cherry picked from commit 78732e5)
Phase 2 cycle 2 4-vote phase-end gauntlet (phase-4 preset) returned reject
with 3 must-fix (1 spec + 2 maint) + 4 should-fix (1 each across spec /
correctness / maint / arch lenses).

Must-fix items addressed:

1. (spec) Two Deferred work entries 'PR-2.6 cutover-lost pushdown shapes'
   and 'Two additional §5-row shapes not yet in the AExpr-direct convertor'
   were not strikethrough/RESOLVED per PR-2.7 plan-row acceptance criterion
   (f). Both entries now marked with strikethrough + 'RESOLVED in PR-2.7
   (commits: ...)' annotation, mirroring the PR-2.8 entry's format.

2. (maint) test_scan_with_hive_partitioning_and_filter docstring described
   the all-or-nothing virtual-column refuse guard at lower_ir.rs:780-791
   (deleted in PR-2.8) and the legacy polars_to_vortex_predicate fallback
   (deleted in PR-2.6). Rewrote docstring to describe the current PR-2.8
   per-minterm split mechanism via aexpr_file_minterms_to_vortex_expression.

3. (maint) test_scan_with_row_index_and_filter docstring similarly stale —
   described the 'cycle-2 hive-only guard extended to row_index' refuse path
   that no longer exists. Rewrote to describe the per-minterm split.

Should-fix items addressed:

4. (maint SF1) polars-vortex/src/read/predicate.rs module header listed 4
   shapes as 'Deferred work items the AExpr-direct convertor does NOT yet
   handle' — but PR-2.7 ported all 4 plus 2 more. Updated the 'Coverage
   parity' section to enumerate the 22 shapes now supported as of PR-2.7 +
   PR-2.8, plus the residual list (non-Strict CAST, cross-kind CAST, regex
   contains, temporal extracts, Float16).

5. (maint SF2) polars-vortex/README.md inner architecture table called
   Column(name) gate 'virtual-column refuse in lower_ir' (now per-minterm
   drop in helper), listed Ternary in the refused-shapes prose
   (contradicted the outer Pushdown coverage table at line 242 which shows
   Ternary shipped), and described the lower_ir.rs guard as still
   refusing-whole-predicate. Updated the table to enumerate the 6 PR-2.7
   shapes + the PR-2.8 per-minterm-drop convention; rewrote the prose
   below the table to describe the strictly-better-than-all-or-nothing
   behavior.

6. (correctness SF) No cross-PR unit test combined a PR-2.7 shape with a
   PR-2.8 virtual_col minterm. Added
   minterms_is_between_with_virtual_col_pushes_is_between_only — exercises
   the leaf-walk-through-Function-args invariant
   (aexpr_to_leaf_names_iter must descend through Function args via
   children_rev to find file vs virtual leaves) and asserts the kept-side
   bounds (10, 20) are present while the dropped-side literal (9999) is
   absent. Defends against a regression in leaf-walking.

7. (arch SF) Added 8 single-line `// SCHEMA-GATE: <kind>` markers at each
   pattern-match arm that directly constructs a Vortex builder (BinaryExpr,
   is_between, is_in, Function::Boolean (unary), StructField, StringExpr,
   Cast, Ternary). Greppable via `rg '// SCHEMA-GATE:'`. Module-level
   docstring extended with a 'SCHEMA-GATE convention' section listing all
   current gates — codifies the recurring discipline so a fresh arm-adder
   either replicates the gate or explicitly justifies why no gate is
   needed.

Should-fix items deferred (per spec reviewer's Option (b) recommendation):

8. (spec SF) is_in arena-level structural unit test deferred — adding the
   test requires constructing a LiteralValue::Scalar(Scalar::new(
   DataType::List(Box::new(Int32)), AnyValue::List(series))) haystack node,
   which needs spinning up polars-core Series + AnyValue::List shell
   (~20-30 LoC of test setup vs ~10 LoC for other shapes). e2e coverage at
   test_scan_with_is_in_filter protects user-visible correctness.
   Documented as a Deferred work entry tracking this defense-in-depth gap.
   Also fixed the misleading comment at vortex_convertor.rs:2046-2058 that
   claimed is_between also lacked unit coverage (it does have
   shape_is_between_structural).

Plan updates:
- 2 Deferred entries strikethrough/RESOLVED in PR-2.7
- 1 new Deferred entry: 'is_in arena-level structural unit test'

Verification:
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
  cargo test -p polars-plan --lib --features vortex,is_between,is_in,regex,strings,dtype-struct vortex_convertor
  → 94 passed (was 93; +1 cross-PR test)
  python3 -c "import ast; ast.parse(open(...test_vortex.py))" → OK
(cherry picked from commit 0a1475b)
Phase 2 cycle 3 4-vote phase-end gauntlet (phase-4 preset: spec +
correctness + maint + arch lenses) ACCEPTED with 0 must-fix, 0
should-fix, 0 nit across all 4 lenses.

Cycle 2 reject had 3 must-fix (1 spec Deferred-entry markup + 2 maint
stale e2e docstrings) + 4 should-fix; the sweep commit 0a1475b
applied all 7 (3 MF + 4 SF) in a single 5-file diff:
- 2 Deferred entries strikethrough/RESOLVED in PR-2.7 with commit refs
- 2 stale e2e test docstrings rewritten to describe PR-2.8 mechanism
- polars-vortex/src/read/predicate.rs module header refreshed
- polars-vortex/README.md inner architecture table refreshed
- New cross-PR unit test (PR-2.7 shape × PR-2.8 minterm helper)
- 8 SCHEMA-GATE markers + module-doc convention section codifying
  the recurring schema-gate discipline across PR-2.2/.3/.4/.7
- 1 new Deferred entry: is_in arena-level structural test
  (per spec reviewer Option (b) — e2e coverage protects user-visible
  correctness; defense-in-depth gap tracked for Phase 3)

Current State updates:
- status: phase-boundary (was: same, but now ready for Step 3.4 gate)
- current_phase: Phase 2 COMPLETE — awaiting Step 3.4 user gate
- phase_end_cycle: 3, phase_end_reject_cycles: 1
- last_phase_end_verdict: accept
- last_commit: 0a1475b
- subagent_invocations_total: 60 (+9 since cycle 2: 4 cycle 2
  reviewers + 4 cycle 3 reviewers + 1 cycle 2 implicit synthesis-skip)

Phase 2 exit criteria status (all SATISFIED):
- (a) 4-vote phase-end accepts cycle 3 (cycle 1 + 3 both accepted;
  cycle 2 rejected and was resolved in 0a1475b)
- (b) §5 pushdown coverage rows implemented or documented as deferred
- (c) Pushdown engagement via structural assertion at unit-test layer
  (e2e engagement-verification via POLARS_VORTEX_VERIFY_PUSHDOWN
  remains in Deferred work)
- (d) Build/test suite still green (94 unit + 13 e2e tests)
- (e) PR-2.0 cycle-3 should-fix carry-forward items resolved
- (f) Cutover-lost shapes (PR-2.7) push down (6 shapes shipped)
- (g) Virtual-column-partitioned scans (PR-2.8) push down file-col part

Phase 2 cumulative: 78 commits on vortex-integration branch since
the vortex-integration-phase-1 base (Phase 1 stack tip). Phase 2
amend (PR-2.7 + PR-2.8 + cycle 2 sweep) totals 30 commits.

Awaiting Step 3.4 user gate before proceeding to Phase 3 (PR-3.1
file-stats + PR-3.2 multi-file + PR-3.3 nested-type coverage).

(cherry picked from commit aa701b9)
…ase 2 cycle 3

The cycle 3 sweep commit 0a1475b introduced new docstrings (3 e2e
test rewrites, 2 PR-2.8 e2e tests) + module-level convention doc + 8
SCHEMA-GATE markers + 1 cross-PR test that triggered:

1. ruff: 17 errors (14 D205 'blank line required between summary and
   description' + 2 W505 'doc line too long' + 1 D301 'use r""" for
   backslashes' + 1 D209 + 1 RUF100). Of these, 3 were auto-fixable
   via ruff --fix; 11 required manual docstring restructuring
   (collapse multi-line summary to single line + blank-line separator
   + add r""" prefix for `\` in docstring at the wildcard-negative-path
   test). Many were pre-existing in test_vortex.py from earlier PRs;
   accumulated unfixed across prior cycles.

2. rustfmt: import-order normalization in vortex_convertor.rs after
   the new use crate::plans::aexpr::MintermIter line (cycle 1
   PR-2.8) shifted neighboring import ordering. cargo fmt applied
   cleanly.

Verification:
  uv tool run ruff check py-polars/ → All checks passed
  cargo fmt --check → clean
  cargo check -p polars --features vortex,cloud,parquet,dtype-full,strings → clean
  cargo test -p polars-plan --lib --features ... vortex_convertor → 94 passed
(cherry picked from commit 6a8dc83)
…e 1++

2026-05-19 PR-stack restructure: this branch (vortex-integration) is now
**Phase 2 only** — AExpr-direct convertor + Option B→A cutover (PR-2.1
through PR-2.8 amend, 60 commits). PR-2.0 cleanups and Phase 3 work
(file-stats, multi-file tests) moved to the Phase 1++ branch
(vortex-integration-phase-1).

User-facing PR split:
- PR #2 (Phase 1++): complete robust Vortex integration foundation
  (read/write + legacy pushdown + file-stats + multi-file tests +
  PR-2.0 cleanups + bench harness + remaining Phase 1++ work yet to
  ship: nested-type + small-int + engagement infra + extended benches)
- PR #1 (Phase 2): pure perf follow-on — replaces legacy
  SpecializedColumnPredicate pushdown with AExpr-direct convertor;
  measured speedup visible against the Phase 1++ baseline benches

Backup at backup-vortex-integration-pre-restack preserves the
pre-restack tip (7b0f670) for audit.

Phase 2 cycle-3 4-vote phase-end ACCEPTED verdict is preserved through
the rebase. PR #1 base needs to be retargeted to vortex-integration-phase-1
(currently points at the old tip). Plan rewrite + full Implementation
status update deferred to next session.
@lwwmanning lwwmanning force-pushed the vortex-integration branch from 4c0f5b1 to 1399146 Compare May 19, 2026 15:55
@lwwmanning
Copy link
Copy Markdown
Member Author

Closed and reopened as a fresh PR to match phase ordering (foundation = lower-numbered PR). Same branch (vortex-integration), same content, no other changes — see the new PR for the active Phase 2 review.

@lwwmanning lwwmanning closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants