Skip to content

fix: source wasNull from validity buffer in VarChar/Binary accessors#187

Draft
mkaufmann wants to merge 2 commits into
mainfrom
fix/null-check-for-get-iceberg
Draft

fix: source wasNull from validity buffer in VarChar/Binary accessors#187
mkaufmann wants to merge 2 commits into
mainfrom
fix/null-check-for-get-iceberg

Conversation

@mkaufmann
Copy link
Copy Markdown
Member

Summary

When Iceberg is on the classpath it sets -Darrow.enable_null_check_for_get=false on the JVM for performance. With that flag off, Arrow's VarCharVector / VarBinaryVector / FixedSizeBinaryVector .get(int) skip the validity check and return stale buffer bytes (typically an empty byte[]) for null rows instead of null. VarCharVectorAccessor and BinaryVectorAccessor were both inferring wasNull from "is the returned byte[] null?", so a SQL NULL would surface as an empty string and wasNull() would return false.

Source wasNull from vector.isNull(int) instead. The other accessors are unaffected:

  • Holder-based primitives (BaseInt, Boolean, Float, Double, Date, Time, TimeStamp, TimeStampTZ) read holder.isSet, populated by vector.get(int, holder) which always honors validity regardless of the flag.
  • DecimalVector.getObject and ListVector/LargeListVector.getObject unconditionally consult isSet, so their return-value-sniff is safe — added inline comments documenting that contract so the asymmetry doesn't read as a bug.

Reported by Magnus Byne in #talk-data-cloud-jdbc.

Test plan

  • :jdbc-core:test now runs with -Darrow.enable_null_check_for_get=false so the existing *FromNulled* accessor tests cover the Iceberg scenario.
  • Verified the existing tests fail on the buggy source (without the fix, three assertions trip across VarCharVectorAccessorTest + BinaryVectorAccessorTest).
  • Full :jdbc-core:test passes (one unrelated flaky JDBCLimitsTest re-passed on rerun).

mkaufmann added 2 commits May 13, 2026 15:04
Iceberg sets -Darrow.enable_null_check_for_get=false on the JVM for performance.
With that flag off, Arrow's VarCharVector/VarBinaryVector/FixedSizeBinaryVector
.get(int) skip the validity check and return stale buffer bytes (typically an
empty byte[]) for null rows instead of null. Both accessors were inferring
wasNull from the returned value, so a SQL NULL came through as an empty string
and wasNull() returned false.

Consult vector::isNull directly. Run :jdbc-core:test with the flag off so the
existing nulled-vector tests cover the Iceberg scenario.
…turn

The Iceberg flag (arrow.enable_null_check_for_get=false) only affects the
typed primitive vector.get(int) accessors on VarCharVector/VarBinaryVector/
FixedSizeBinaryVector. DecimalVector.getObject and ListVector/
LargeListVector.getObject unconditionally consult isSet, so the
return-value-sniff pattern is safe here. Note that contract inline so the
asymmetry with the just-fixed VarChar/Binary accessors doesn't read as a bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant