HIVE-29524: Missing num_nulls statistic for partition columns#6410
Draft
tanishq-chugh wants to merge 1 commit intoapache:masterfrom
Draft
HIVE-29524: Missing num_nulls statistic for partition columns#6410tanishq-chugh wants to merge 1 commit intoapache:masterfrom
tanishq-chugh wants to merge 1 commit intoapache:masterfrom
Conversation
|
zabetak
reviewed
Apr 7, 2026
Member
zabetak
left a comment
There was a problem hiding this comment.
Thanks for the PR @tanishq-chugh ! I left some refactoring suggestions. Apart from that it seems that some .q.out files need to be updated.
Comment on lines
+617
to
+634
| private static long getNumNullsForPartCol(PartitionIterable partitions, String partColName, HiveConf conf) { | ||
| long numNulls = 0; | ||
| String defaultPartitionName = HiveConf.getVar(conf, HiveConf.ConfVars.DEFAULT_PARTITION_NAME); | ||
| for (Partition partition : partitions) { | ||
| String partVal = partition.getSpec().get(partColName); | ||
| if (partVal != null && partVal.equals(defaultPartitionName)) { | ||
| Map<String, String> parameters = partition.getParameters(); | ||
| if (parameters != null && parameters.get(StatsSetupConst.ROW_COUNT) != null) { | ||
| long rowCount = Long.parseLong(parameters.get(StatsSetupConst.ROW_COUNT)); | ||
| if (rowCount > 0) { | ||
| numNulls = safeAdd(numNulls, rowCount); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| return numNulls; | ||
| } | ||
|
|
Member
There was a problem hiding this comment.
I am wondering if we could take advantage of the existing StatsUtils#getNumRows method to some extend. At the very least we may be able to reuse some existing classes such as org.apache.hadoop.hive.ql.stats.BasicStats.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



What changes were proposed in this pull request?
num_nulls statistics should be computed for partition columns
Why are the changes needed?
Currently, the num_nulls statistics is not populated and is always zero which is wrong information to the user and also, any estimations that rely on ColStatistics.getNumNulls will also be inaccurate.
Does this PR introduce any user-facing change?
Yes, num_nulls metrics which was not populated earlier and always defaulted to zero, will be rightly computed and visible to user.
How was this patch tested?
Manual Testing & Qtest