fix: source wasNull from validity buffer in VarChar/Binary accessors#187
Draft
mkaufmann wants to merge 2 commits into
Draft
fix: source wasNull from validity buffer in VarChar/Binary accessors#187mkaufmann wants to merge 2 commits into
mkaufmann wants to merge 2 commits into
Conversation
Iceberg sets -Darrow.enable_null_check_for_get=false on the JVM for performance. With that flag off, Arrow's VarCharVector/VarBinaryVector/FixedSizeBinaryVector .get(int) skip the validity check and return stale buffer bytes (typically an empty byte[]) for null rows instead of null. Both accessors were inferring wasNull from the returned value, so a SQL NULL came through as an empty string and wasNull() returned false. Consult vector::isNull directly. Run :jdbc-core:test with the flag off so the existing nulled-vector tests cover the Iceberg scenario.
…turn The Iceberg flag (arrow.enable_null_check_for_get=false) only affects the typed primitive vector.get(int) accessors on VarCharVector/VarBinaryVector/ FixedSizeBinaryVector. DecimalVector.getObject and ListVector/ LargeListVector.getObject unconditionally consult isSet, so the return-value-sniff pattern is safe here. Note that contract inline so the asymmetry with the just-fixed VarChar/Binary accessors doesn't read as a bug.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When Iceberg is on the classpath it sets
-Darrow.enable_null_check_for_get=falseon the JVM for performance. With that flag off, Arrow'sVarCharVector/VarBinaryVector/FixedSizeBinaryVector.get(int)skip the validity check and return stale buffer bytes (typically an emptybyte[]) for null rows instead ofnull.VarCharVectorAccessorandBinaryVectorAccessorwere both inferringwasNullfrom "is the returnedbyte[]null?", so a SQL NULL would surface as an empty string andwasNull()would returnfalse.Source
wasNullfromvector.isNull(int)instead. The other accessors are unaffected:BaseInt,Boolean,Float,Double,Date,Time,TimeStamp,TimeStampTZ) readholder.isSet, populated byvector.get(int, holder)which always honors validity regardless of the flag.DecimalVector.getObjectandListVector/LargeListVector.getObjectunconditionally consultisSet, so their return-value-sniff is safe — added inline comments documenting that contract so the asymmetry doesn't read as a bug.Reported by Magnus Byne in #talk-data-cloud-jdbc.
Test plan
:jdbc-core:testnow runs with-Darrow.enable_null_check_for_get=falseso the existing*FromNulled*accessor tests cover the Iceberg scenario.VarCharVectorAccessorTest+BinaryVectorAccessorTest).:jdbc-core:testpasses (one unrelated flakyJDBCLimitsTestre-passed on rerun).