HIVE-29534: getColStatistics to use HMS NDV values for DATE/TIMESTAMP column types#6406
HIVE-29534: getColStatistics to use HMS NDV values for DATE/TIMESTAMP column types#6406konstantinb wants to merge 3 commits intoapache:masterfrom
Conversation
zabetak
left a comment
There was a problem hiding this comment.
I am fully supportive of this change. Even though it leads to many .q.out changes not having NDV for DATE/TIMESTAMP columns is a genuine bug that requires a fix. In its current state, some .q.out files seem to be a bit messed up but once you get a clean run with all .q.out correctly updated I will approve the PR.
|
|
I needed to rebase the branch and regenerate JSON .out files. All tests pass now |
zabetak
left a comment
There was a problem hiding this comment.
LGTM, I will merge this in 24 hours unless someone has objections.
After going through the .q.out changes it becomes evident that the CBO optimization phase is affected heavily by NDV statistics which explains the join order changes in the TPC-DS query plans. This change may cause noticeable changes in TPC-DS bencharks.



What changes were proposed in this pull request?
HIVE-29534: getColStatistics to use HMS NDV values forDATE/TIMESTAMP column types
Why are the changes needed?
Using statistics info for these fields when available could improve query planning/performance
Does this PR introduce any user-facing change?
No
How was this patch tested?
reviewed impacted .out files