fix(dataframe): handle FixedSizeBinary in describe#21455
fix(dataframe): handle FixedSizeBinary in describe#21455officialasishkumar wants to merge 2 commits intoapache:mainfrom
Conversation
kumarUjjawal
left a comment
There was a problem hiding this comment.
LGTM, can we add a mixed-schema test (numeric + FixedSizeBinary, or two columns where one is filtered) so the regression test actually covers the filter rather than the empty-aggregate fallback. The other items are nice-to-haves.
Skip min/max describe summaries for unsupported binary-like types so describe falls back to nulls instead of attempting an invalid Utf8 cast. Add a regression test for FixedSizeBinary and rerun the dataframe describe integration tests reported in apache#20273.
…xedSizeBinary Add a test that combines numeric (Int32) and FixedSizeBinary columns to exercise the filter path in describe(), where min/max aggregations skip FixedSizeBinary but still compute results for numeric columns. This covers the filter rather than the empty-aggregate fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
acad2de to
e7da510
Compare
officialasishkumar
left a comment
There was a problem hiding this comment.
Thanks for the review @kumarUjjawal! I've added a describe_mixed_numeric_and_fixed_size_binary test that creates a table with both an Int32 column and a FixedSizeBinary(3) column. This ensures the min/max filter path is exercised (the numeric column gets min/max computed while the FixedSizeBinary column is filtered out), rather than falling through to the empty-aggregate fallback that occurs when all columns are unsupported.
The branch has also been rebased on latest main.
Which issue does this PR close?
Rationale for this change
DataFrame::describe()builds min/max aggregates for non-numeric columns and then casts the results toUtf8for display. That works for strings, but it fails for unsupported binary-like outputs such asFixedSizeBinary, which currently causes describe to error instead of falling back to null summary values.What changes are included in this PR?
FixedSizeBinaryas an unsupported min/max describe type, alongside the other binary-like types that cannot be rendered through the currentUtf8cast path.describe()on aFixedSizeBinarycolumn.Are these changes tested?
cargo test -p datafusion --test core_integration describe_fixed_size_binary -- --nocapturecargo test -p datafusion --test core_integration dataframe::describe:: -- --nocaptureAre there any user-facing changes?
describe()no longer errors onFixedSizeBinarycolumns; unsupported min/max summaries now fall back tonullas intended.