Skip to content

HIVE-29524: Missing num_nulls statistic for partition columns#6410

Draft
tanishq-chugh wants to merge 1 commit intoapache:masterfrom
tanishq-chugh:HIVE-29524
Draft

HIVE-29524: Missing num_nulls statistic for partition columns#6410
tanishq-chugh wants to merge 1 commit intoapache:masterfrom
tanishq-chugh:HIVE-29524

Conversation

@tanishq-chugh
Copy link
Copy Markdown
Contributor

@tanishq-chugh tanishq-chugh commented Apr 6, 2026

What changes were proposed in this pull request?

num_nulls statistics should be computed for partition columns

Why are the changes needed?

Currently, the num_nulls statistics is not populated and is always zero which is wrong information to the user and also, any estimations that rely on ColStatistics.getNumNulls will also be inaccurate.

Does this PR introduce any user-facing change?

Yes, num_nulls metrics which was not populated earlier and always defaulted to zero, will be rightly computed and visible to user.

How was this patch tested?

Manual Testing & Qtest

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 6, 2026

Copy link
Copy Markdown
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @tanishq-chugh ! I left some refactoring suggestions. Apart from that it seems that some .q.out files need to be updated.

Comment on lines +617 to +634
private static long getNumNullsForPartCol(PartitionIterable partitions, String partColName, HiveConf conf) {
long numNulls = 0;
String defaultPartitionName = HiveConf.getVar(conf, HiveConf.ConfVars.DEFAULT_PARTITION_NAME);
for (Partition partition : partitions) {
String partVal = partition.getSpec().get(partColName);
if (partVal != null && partVal.equals(defaultPartitionName)) {
Map<String, String> parameters = partition.getParameters();
if (parameters != null && parameters.get(StatsSetupConst.ROW_COUNT) != null) {
long rowCount = Long.parseLong(parameters.get(StatsSetupConst.ROW_COUNT));
if (rowCount > 0) {
numNulls = safeAdd(numNulls, rowCount);
}
}
}
}
return numNulls;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we could take advantage of the existing StatsUtils#getNumRows method to some extend. At the very least we may be able to reuse some existing classes such as org.apache.hadoop.hive.ql.stats.BasicStats.

@tanishq-chugh tanishq-chugh marked this pull request as draft April 7, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants