Skip to content

feat: support NullType in row-to-Arrow conversion and shuffle#4460

Open
mbutrovich wants to merge 5 commits into
apache:mainfrom
mbutrovich:fix_4457
Open

feat: support NullType in row-to-Arrow conversion and shuffle#4460
mbutrovich wants to merge 5 commits into
apache:mainfrom
mbutrovich:fix_4457

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4457.

Rationale for this change

NullType columns currently break Comet at the row-to-Arrow boundary. Utils.toArrowType throws UnsupportedOperationException for NullType, which surfaces in two places:

  1. CometLocalTableScanExec when a LocalTableScan contains a NullType column (the case in Queries with NullType aggregate fails when native LocalTableScanExec is enabled #4457, e.g. SELECT max(col) FROM VALUES (NULL), (NULL) AS t(col)).
  2. CometShuffleExchangeExec when a Spark LocalTableScanExec with a NullType column feeds a Comet shuffle.

NullType is well-defined in Arrow (ArrowType.Null) and the Spark ArrowWriter already has a NullWriter case, so the right fix is to support it end-to-end rather than fall back. This PR is an alternative to #4458, which adds a LocalTableScanExec-only fallback and leaves the shuffle path broken.

What changes are included in this PR?

  • Utils.toArrowType maps NullType to ArrowType.Null.
  • CometShuffleExchangeExec.supportedSerializableDataType accepts NullType for both native and columnar shuffle.
  • Native shuffle row reader (native/shuffle/src/spark_unsafe/row.rs) handles NullType.

How are these changes tested?

New regression tests:

  • CometExecSuite "CometLocalTableScanExec handles NullType nested in struct/array/map" covers NullType nested under StructType, ArrayType, and MapType through CometLocalTableScanExec.
  • CometColumnarShuffleSuite "columnar shuffle with NullType passthrough column" covers JVM-input columnar shuffle with a NullType column. Replaces the older "Fallback to Spark for unsupported input besides ordering" test, which asserted the previous fallback behavior.
  • CometNativeShuffleSuite "native shuffle with NullType passthrough column" covers native shuffle with a Comet LocalTableScan source containing a NullType column. Gated on spark.comet.exec.localTableScan.enabled=true because native shuffle requires Comet input.

@mbutrovich mbutrovich requested a review from andygrove May 28, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queries with NullType aggregate fails when native LocalTableScanExec is enabled

1 participant