feat: enable CometLocalTableScanExec by default#4393
Conversation
Failure counts
Clean: Root-cause buckets (ranked by blast radius)B1 —
|
| Order | Action | Eliminates | Risk |
|---|---|---|---|
| 1 | B1 fallback (NullType in schema) | ~600 + likely unblocks the 5 timed-out shards | Low |
| 2 | B2 fallback (TimeType in schema) | ~25 | Low |
| 3 | B3 investigation + fix (Iceberg silent corruption) | 220 + ~20 Spark SQL | Medium (data correctness) |
| 4 | B4 stale assertion update + sibling grep | 8 | Trivial |
| 5 | B6 subquery registration | ~8 | Low-medium |
| 6 | B5 skip list (or per-test config) | ~40 | Low |
| 7 | B7 individually | ~20 | Varies |
After (1)–(2), failure count should drop from ~1,200 to ~250 and most remaining failures will be the substantive ones.
Open questions
- B3: does Iceberg's columnar DSv2 branch-write path honor
setArrowFfiSafe(false), or does the contract only apply to native consumers? If only native,CometLocalTableScanExecneeds to copy batches before exposing them to non-Comet consumers (or be disallowed as a direct columnar input to non-Comet writers). - B5: skip list vs per-test disable — preference? A skip list is easier to maintain; per-test config keeps Comet exercised.
- Whether to keep the default flip in PR feat: enable CometLocalTableScanExec by default #4393 and land fixes incrementally, or to gate the flip behind B1–B3 landing first.
…nonical Arrow C Stream Interface (JVM Data.exportArrayStream <-> native ArrowArrayStreamReader), eliminating the per-batch FFI deep copy and the arrow_ffi_safe flag.
…le_localtablescan
Branch statusJVM and native input now use the canonical Arrow C Stream Interface: JVM-side input is unified behind three
Native side: Bucket status
Recent fixes since the last CI sweepDictionary-encoded
|
Which issue does this PR close?
Closes #4347.
Rationale for this change
Currently
CometLocalTableScanExecis disabled by default. A lot of Spark SQL tests (UDFs, expressions) don't write their input to sources that Comet reads natively (e.g., Parquet, Iceberg) so they are likely not being exercised through Comet.What changes are included in this PR?
Enable
localTableScantranslation to Comet by default.How are these changes tested?
Existing tests.