Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
b4c21d5
plan: start PR-2.1
lwwmanning May 16, 2026
70b231e
feat(polars-plan,polars-vortex): PR-2.1 — AExpr-direct convertor foun…
lwwmanning May 16, 2026
cf86737
plan: amend PR-2.1 row — correct file location to polars-plan (archit…
lwwmanning May 16, 2026
6ed4d5f
plan: PR-2.1 awaiting review
lwwmanning May 16, 2026
3cb61e1
docs(polars-plan): PR-2.1 cycle-1 should-fix — bitwise-vs-logical TOD…
lwwmanning May 16, 2026
998b191
plan: PR-2.1 gauntlet cycle 1 accepted (zero must-fix); 3 should-fix …
lwwmanning May 16, 2026
a8dc66f
plan: PR-2.1 complete (confidence: high, deferred: 1)
lwwmanning May 16, 2026
ba4e5de
plan: start PR-2.2
lwwmanning May 16, 2026
48517b4
feat(polars-plan): PR-2.2 commit 1 — extend convertor with Plus + sch…
lwwmanning May 16, 2026
f6da52b
feat(polars-stream,polars-plan): PR-2.2 commit 2 — wire AExpr convert…
lwwmanning May 16, 2026
1446c7b
plan: PR-2.2 awaiting review (2 commits — convertor extension + wire-up)
lwwmanning May 16, 2026
f3ffdb1
fix(polars-plan,polars-stream): PR-2.2 cycle-1 must-fix — hive guard …
lwwmanning May 16, 2026
27807a5
plan: PR-2.2 cycle-1 disposition + 3 new Deferred-work entries
lwwmanning May 16, 2026
057164e
fix(polars-plan,polars-stream,py-polars): PR-2.2 cycle-2 should-fix s…
lwwmanning May 16, 2026
68057ce
plan: PR-2.2 complete (cycle 2 accept, confidence: high, deferred: 10)
lwwmanning May 16, 2026
42c0d68
feat(polars-plan,polars-vortex,py-polars): PR-2.3 — PR-13.3 CAST in p…
lwwmanning May 16, 2026
b41818d
fix(polars-plan,py-polars): PR-2.3 cycle-1 must-fix — CAST source-kin…
lwwmanning May 17, 2026
69917e0
fix(polars-plan): PR-2.3 cycle-2 should-fix — Bool↔Utf8 tests + CAST …
lwwmanning May 17, 2026
f9675e9
plan: PR-2.3 complete (cycle 2 accept, confidence: high, deferred: 12)
lwwmanning May 17, 2026
ff5d571
feat(polars-plan,py-polars): PR-2.4 — PR-13.4 Struct field access + p…
lwwmanning May 17, 2026
063f1bf
fix(polars-plan): PR-2.4 cycle-2 should-fix — comparison pairwise gat…
lwwmanning May 17, 2026
eb5e58e
plan: PR-2.4 complete (cycle 2 accept, confidence: high, deferred: 11)
lwwmanning May 17, 2026
666ea33
plan: PR-2.5 SLIPPED to Deferred work (Vortex datetime_parts unavaila…
lwwmanning May 17, 2026
a7f468c
refactor(polars-vortex,polars-stream,polars-plan): PR-2.6 — Option B …
lwwmanning May 17, 2026
c920558
fix(polars-vortex,polars-stream,plan): PR-2.6 cycle-1 must-fix — docu…
lwwmanning May 17, 2026
2a467ba
docs(polars-vortex,polars-stream): PR-2.6 cycle-2 nit — revert path-r…
lwwmanning May 17, 2026
a5989d0
plan: PR-2.6 complete (cycle 2 accept, confidence: high, deferred: 13…
lwwmanning May 17, 2026
1b94e01
fix: Phase 2 phase-end should-fix sweep — CI green-up + plan/doc cleanup
lwwmanning May 17, 2026
71203fa
plan: PAUSED at Phase 2 → 3 boundary per user direction (Phase 2 COMP…
lwwmanning May 17, 2026
4669747
plan: PR-1.5 + 2-branch GitHub stacking strategy (benches in Phase 1 …
lwwmanning May 18, 2026
147e15e
plan: refresh header + Current State after PR-stack restructure (last…
lwwmanning May 18, 2026
553b7b5
style(polars-plan,polars-vortex,py-polars): CI lint sweep (clippy-nig…
lwwmanning May 18, 2026
2e8d6e9
plan: begin amend phase 2: add PR-2.7 (cutover-lost shapes) + PR-2.8 …
lwwmanning May 18, 2026
7cca9ec
feat(polars-plan): PR-2.7 cycle 1 — port cutover-lost pushdown shapes
lwwmanning May 18, 2026
50d5c8e
docs(polars-vortex,py-polars): PR-2.7 cycle 1 — e2e tests + README pu…
lwwmanning May 18, 2026
e88b3d5
plan: amend phase 2 complete + PR-2.7 awaiting 2-vote inner-loop review
lwwmanning May 18, 2026
e43eac8
plan: PR-2.7 gauntlet cycle 1 — 3 must-fix items
lwwmanning May 18, 2026
72e30bd
fix(polars-plan): PR-2.7 cycle-1 must-fix #1 — is_between pairwise-PT…
lwwmanning May 18, 2026
e75b6ed
plan: PR-2.7 must-fix item resolved — outstanding count 2 (is_between…
lwwmanning May 18, 2026
70d5289
fix(polars-plan): PR-2.7 cycle-1 must-fix #2 — Ternary THEN/ELSE pair…
lwwmanning May 18, 2026
2eaa2bb
plan: PR-2.7 must-fix item resolved — outstanding count 1 (Ternary gate)
lwwmanning May 18, 2026
7195cc9
fix(polars-plan): PR-2.7 cycle-1 must-fix #3 — StringExpr Utf8 input …
lwwmanning May 18, 2026
ed9be44
plan: PR-2.7 must-fix item resolved — outstanding count 0 (StringExpr…
lwwmanning May 18, 2026
8ad1b70
docs(polars-plan,polars-vortex): PR-2.7 cycle-1 should-fix sweep
lwwmanning May 18, 2026
8d8c997
fix(polars-plan): PR-2.7 cycle-2 should-fix #1 — resolve_inner_dtype …
lwwmanning May 19, 2026
4940a72
fix(polars-plan): PR-2.7 cycle-2 should-fix #2 — feature-gate strings…
lwwmanning May 19, 2026
9df0e2b
fix(polars-plan): PR-2.7 cycle-2 should-fix #3/#4/#5 — strengthen str…
lwwmanning May 19, 2026
b656bed
docs(polars-plan,polars-vortex): PR-2.7 cycle-2 should-fix #6 + nit #…
lwwmanning May 19, 2026
3181d11
plan: PR-2.7 cycle-2 nit #8 — add Float NaN semantic divergence Defer…
lwwmanning May 19, 2026
181068a
plan: PR-2.7 cycle-2 polish landed (8 commits: 6 should-fix + 2 nit);…
lwwmanning May 19, 2026
05d90cd
plan: PR-2.7 complete (confidence: high, deferred: 1)
lwwmanning May 19, 2026
061e3f7
feat(polars-plan,polars-stream,py-polars): PR-2.8 cycle 1 — virtual-c…
lwwmanning May 19, 2026
6f8c2fb
plan: PR-2.8 awaiting 2-vote inner-loop review
lwwmanning May 19, 2026
b2384e8
test(polars-plan,py-polars): PR-2.8 cycle 2 — tighten test rigor
lwwmanning May 19, 2026
84b31b2
fix(py-polars): PR-2.8 cycle 3 must-fix — include_file_paths test dis…
lwwmanning May 19, 2026
88c5c14
plan: PR-2.8 COMPLETE (cycle 3 accept); Phase 2 ready for cumulative …
lwwmanning May 19, 2026
cf325f6
fix: Phase 2 cycle 2 phase-end sweep — 3 must-fix + 4 should-fix
lwwmanning May 19, 2026
d896fbc
plan: Phase 2 COMPLETE — cycle 3 4-vote phase-end ACCEPTED
lwwmanning May 19, 2026
199e1da
style(polars-plan,py-polars): CI lint sweep — ruff + rustfmt after Ph…
lwwmanning May 19, 2026
01e8952
style(polars-vortex): dprint fmt README.md after Phase 2 cycle 3 inne…
lwwmanning May 19, 2026
1399146
plan: re-stack restructure — Phase 2 is pure convertor work atop Phas…
lwwmanning May 19, 2026
c909772
plan: refresh Phases and PRs + Implementation status for the 2026-05-…
lwwmanning May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
596 changes: 565 additions & 31 deletions .big-plans/vortex-integration.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions crates/polars-plan/src/plans/aexpr/predicates/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
mod column_expr;
mod skip_batches;
#[cfg(feature = "vortex")]
pub mod vortex_convertor;

use std::borrow::Cow;

Expand Down
2,983 changes: 2,983 additions & 0 deletions crates/polars-plan/src/plans/aexpr/predicates/vortex_convertor.rs

Large diffs are not rendered by default.

26 changes: 23 additions & 3 deletions crates/polars-stream/src/nodes/io_sources/vortex/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use polars_io::cloud::CloudOptions;
use polars_io::metrics::IOMetrics;
use polars_plan::dsl::ScanSource;
use polars_vortex::read::VortexSegmentCacheRef;
use polars_vortex::vortex::expr::Expression as VortexExpression;
use polars_vortex::{VortexScanOptions, vortex};

use super::VortexFileReader;
Expand All @@ -23,6 +24,17 @@ pub struct VortexReaderBuilder {
/// streaming source falls back to `options.segment_cache.resolve()` (e.g., user-supplied
/// schema path where no postscript read happened at IR-build).
pub segment_cache: Option<VortexSegmentCacheRef>,
/// AExpr-direct convertor result (PR-13.2, sole pushdown path as of PR-2.6): when
/// the predicate translated cleanly via
/// `polars_plan::plans::predicates::vortex_convertor::aexpr_to_vortex_expression`
/// (the `aexpr` module is `pub(crate)`; the externally-resolvable path goes through
/// the `pub use aexpr::*` re-export at `plans/mod.rs`),
/// the Vortex `Expression` is captured here at IR-build time (where we still have
/// `expr_arena` access). `VortexFileReader::begin_read` uses it directly. The
/// multi-scan layer reapplies the full predicate post-decode regardless (we
/// advertise `PARTIAL_FILTER`), so it is safe to push only a subset; shapes the
/// convertor returns `None` for fall through to no-pushdown + post-decode reapply.
pub aexpr_filter: Option<VortexExpression>,
pub io_metrics: std::sync::OnceLock<Arc<IOMetrics>>,
}

Expand All @@ -42,9 +54,12 @@ impl FileReaderBuilder for VortexReaderBuilder {
fn reader_capabilities(&self) -> ReaderCapabilities {
use ReaderCapabilities as RC;
// The multi-scan layer reapplies the full predicate post-decode, so PARTIAL_FILTER
// is always safe; FULL_FILTER would require the convertor to consume every shape
// it sees (out-of-scope while we lean on `SpecializedColumnPredicate`).
// EXTERNAL_FILTER_MASK would need Vortex `Selection` bitmap plumbing.
// is always safe; FULL_FILTER would require the AExpr-direct convertor to be a
// strict superset of every AExpr predicate Polars constructs (still out of scope:
// the convertor returns None for unhandled shapes — Sort/Gather/Filter/Agg/Ternary/
// AnonymousFunction/Over/Rolling/temporal extracts/etc. — and the multi-scan
// reapply handles them). EXTERNAL_FILTER_MASK would need Vortex `Selection` bitmap
// plumbing.
RC::ROW_INDEX
| RC::PRE_SLICE
| RC::NEGATIVE_PRE_SLICE
Expand Down Expand Up @@ -78,6 +93,11 @@ impl FileReaderBuilder for VortexReaderBuilder {
// Threaded resolved cache for the data read; `None` triggers fallback resolve()
// inside `VortexFileReader::initialize`. Same pattern as `footer` above.
segment_cache: self.segment_cache.clone(),
// AExpr-direct convertor result (sole pushdown path as of PR-2.6). Shared
// across all sources in a multi-source scan — the convertor result is
// purely a function of the (predicate, schema) pair, both of which are
// constant across the scan's sources.
aexpr_filter: self.aexpr_filter.clone(),
io_metrics: OptIOMetrics(self.io_metrics.get().cloned()),
init_data: None,
}) as _
Expand Down
25 changes: 19 additions & 6 deletions crates/polars-stream/src/nodes/io_sources/vortex/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ use polars_utils::slice_enum::Slice;
use polars_vortex::read::array_bridge::{
ArrowUpstreamSchema, arrow_dtypes_from_schema, record_batch_to_dataframe,
};
use polars_vortex::read::predicate::polars_to_vortex_predicate;
use polars_vortex::read::read_at::local_file_read_at;
use polars_vortex::read::schema::vortex_dtype_to_schema;
use polars_vortex::session::session;
Expand Down Expand Up @@ -51,6 +50,14 @@ pub struct VortexFileReader {
/// read. When `None` (user-supplied-schema path; no IR-build postscript read happened),
/// `initialize()` falls back to `options.segment_cache.resolve()` for a fresh cache.
pub segment_cache: Option<polars_vortex::read::VortexSegmentCacheRef>,
/// AExpr-direct convertor result threaded from IR-build (`FileScanIR::Vortex` →
/// `VortexReaderBuilder::aexpr_filter` → here). When `Some`, `begin_read` uses this
/// Vortex `Expression` directly. PR-2.6 (Option B → A cutover) deleted the legacy
/// `polars_to_vortex_predicate` fallback — the AExpr-direct convertor is now the
/// sole filter-pushdown path. When `None` (unhandled AExpr shape, or
/// `push_predicate=false`), no filter pushes down; the multi-scan layer reapplies
/// the predicate post-decode (we advertise `PARTIAL_FILTER`).
pub aexpr_filter: Option<polars_vortex::vortex::expr::Expression>,
pub io_metrics: OptIOMetrics,

/// Set by `initialize()`.
Expand Down Expand Up @@ -248,12 +255,18 @@ impl FileReader for VortexFileReader {
}
});

// Translate the pushable bits of args.predicate into a Vortex `Expression`. We
// advertise `PARTIAL_FILTER` capability, so the multi-scan layer keeps the
// original predicate around to apply post-decode — pushing only what we can
// convert is safe (over-conservative pushdown would drop rows incorrectly).
// Use the AExpr-direct convertor result computed at IR-build time
// (`physical_plan::lower_ir`). We advertise `PARTIAL_FILTER` capability so the
// multi-scan layer keeps the original predicate around to apply post-decode —
// pushing only what we can convert is safe.
//
// PR-2.6 (Option B → A cutover) deleted the legacy `polars_to_vortex_predicate`
// (`SpecializedColumnPredicate`-derived) fallback path; the AExpr-direct
// convertor is now the sole filter-pushdown path. When `aexpr_filter` is `None`
// (unhandled AExpr shape, or `push_predicate=false`), no filter pushes down and
// the multi-scan reapply handles correctness.
let filter_expr = if self.options.push_predicate {
args.predicate.as_ref().and_then(polars_to_vortex_predicate)
self.aexpr_filter.clone()
} else {
None
};
Expand Down
83 changes: 72 additions & 11 deletions crates/polars-stream/src/physical_plan/lower_ir.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ use polars_plan::dsl::default_values::DefaultFieldValues;
use polars_plan::dsl::deletion::DeletionFilesList;
use polars_plan::dsl::{CallbackSinkType, ExtraColumnsPolicy, FileScanIR, SinkTypeIR};
use polars_plan::plans::expr_ir::{ExprIR, OutputName};
#[cfg(feature = "vortex")]
use polars_plan::plans::predicates::vortex_convertor;
use polars_plan::plans::{AExpr, FunctionIR, IR, IRAggExpr, LiteralValue, write_ir_non_recursive};
use polars_plan::prelude::*;
use polars_utils::arena::{Arena, Node};
Expand Down Expand Up @@ -767,17 +769,76 @@ pub fn lower_ir(
options,
metadata: first_metadata,
segment_cache,
} => Arc::new(
crate::nodes::io_sources::vortex::builder::VortexReaderBuilder {
options: Arc::new(options.clone()),
first_metadata: first_metadata.clone(),
// Threaded from IR-build (scans.rs::vortex_file_info caller); when
// None (schema-supplied path), the streaming source resolves a
// fresh cache from `options.segment_cache`.
segment_cache: segment_cache.clone(),
io_metrics: std::sync::OnceLock::new(),
},
) as _,
} => {
// PR-13.2 AExpr-direct pushdown (sole pushdown path as of PR-2.6):
// when a predicate exists and `push_predicate` is on, try the
// convertor. The result rides on the builder; `VortexFileReader::
// begin_read` uses it directly. We are still at IR-build time
// here, so we have `expr_arena` access — which we lose by the
// time `begin_read` runs (where only `Arc<dyn PhysicalIoExpr>`
// survives). When the convertor returns `None` (unhandled AExpr
// shape OR all minterms touched virtual columns), `aexpr_filter`
// stays `None` and no filter pushes down; the multi-scan layer
// reapplies the predicate post-decode via the `PARTIAL_FILTER`
// capability so correctness is preserved.
//
// Per-column file-vs-virtual split (PR-2.8, replaces PR-2.2
// cycle-1 must-fix M1's all-or-nothing guard): `file_info.schema`
// "Does not include logical columns like `include_file_path` and
// row index" but DOES include hive columns (plans/schema.rs:42-43).
// The convertor's `AExpr::Column` arm emits a bare `get_item(name,
// root())` with no schema-membership check, so a predicate
// referencing a hive column / row_index / include_file_paths would
// emit a Vortex reference to a column not in the file's data —
// Vortex would bail at `into_array_stream`.
//
// Build a `virtual_cols` set from hive partition column names
// + row_index name + include_file_paths name. Pass it to
// `vortex_convertor::aexpr_file_minterms_to_vortex_expression`,
// which walks top-level conjuncts via `MintermIter` and converts
// only the file-only minterms; virtual-column-touching minterms
// are left for the multi-scan reapply. Mirrors
// `polars-mem-engine/scan_predicate/functions::create_scan_predicate`'s
// `hive_predicate` extraction. Resolves Deferred work entry
// "Virtual-column-partitioned Vortex scans don't benefit from
// AExpr convertor pushdown".
let aexpr_filter = if options.push_predicate {
predicate.as_ref().and_then(|p| {
let mut virtual_cols: PlHashSet<PlSmallStr> = PlHashSet::default();
if let Some(hp) = hive_parts.as_ref() {
for name in hp.schema().iter_names() {
virtual_cols.insert(name.clone());
}
}
if let Some(ri) = unified_scan_args.row_index.as_ref() {
virtual_cols.insert(ri.name.clone());
}
if let Some(ifp) = unified_scan_args.include_file_paths.as_ref() {
virtual_cols.insert(ifp.clone());
}
vortex_convertor::aexpr_file_minterms_to_vortex_expression(
p.node(),
expr_arena,
Some(file_info.schema.as_ref()),
&virtual_cols,
)
})
} else {
None
};
Arc::new(
crate::nodes::io_sources::vortex::builder::VortexReaderBuilder {
options: Arc::new(options.clone()),
first_metadata: first_metadata.clone(),
// Threaded from IR-build (scans.rs::vortex_file_info caller); when
// None (schema-supplied path), the streaming source resolves a
// fresh cache from `options.segment_cache`.
segment_cache: segment_cache.clone(),
aexpr_filter,
io_metrics: std::sync::OnceLock::new(),
},
) as _
},

#[cfg(feature = "csv")]
FileScanIR::Csv { options } => {
Expand Down
Loading
Loading