Skip to content

fix(dataframe): handle FixedSizeBinary in describe#21455

Open
officialasishkumar wants to merge 2 commits intoapache:mainfrom
officialasishkumar:fix/describe-fixed-size-binary
Open

fix(dataframe): handle FixedSizeBinary in describe#21455
officialasishkumar wants to merge 2 commits intoapache:mainfrom
officialasishkumar:fix/describe-fixed-size-binary

Conversation

@officialasishkumar
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

DataFrame::describe() builds min/max aggregates for non-numeric columns and then casts the results to Utf8 for display. That works for strings, but it fails for unsupported binary-like outputs such as FixedSizeBinary, which currently causes describe to error instead of falling back to null summary values.

What changes are included in this PR?

  • Treat FixedSizeBinary as an unsupported min/max describe type, alongside the other binary-like types that cannot be rendered through the current Utf8 cast path.
  • Reuse the same type predicate for both min and max summary construction.
  • Add a regression test covering describe() on a FixedSizeBinary column.

Are these changes tested?

  • cargo test -p datafusion --test core_integration describe_fixed_size_binary -- --nocapture
  • cargo test -p datafusion --test core_integration dataframe::describe:: -- --nocapture

Are there any user-facing changes?

describe() no longer errors on FixedSizeBinary columns; unsupported min/max summaries now fall back to null as intended.

@github-actions github-actions bot added the core Core DataFusion crate label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@kumarUjjawal kumarUjjawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can we add a mixed-schema test (numeric + FixedSizeBinary, or two columns where one is filtered) so the regression test actually covers the filter rather than the empty-aggregate fallback. The other items are nice-to-haves.

officialasishkumar and others added 2 commits April 8, 2026 16:32
Skip min/max describe summaries for unsupported binary-like types so describe falls back to nulls instead of attempting an invalid Utf8 cast. Add a regression test for FixedSizeBinary and rerun the dataframe describe integration tests reported in apache#20273.
…xedSizeBinary

Add a test that combines numeric (Int32) and FixedSizeBinary columns to
exercise the filter path in describe(), where min/max aggregations skip
FixedSizeBinary but still compute results for numeric columns. This
covers the filter rather than the empty-aggregate fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@officialasishkumar officialasishkumar force-pushed the fix/describe-fixed-size-binary branch from acad2de to e7da510 Compare April 8, 2026 16:35
Copy link
Copy Markdown
Author

@officialasishkumar officialasishkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @kumarUjjawal! I've added a describe_mixed_numeric_and_fixed_size_binary test that creates a table with both an Int32 column and a FixedSizeBinary(3) column. This ensures the min/max filter path is exercised (the numeric column gets min/max computed while the FixedSizeBinary column is filtered out), rather than falling through to the empty-aggregate fallback that occurs when all columns are unsupported.

The branch has also been rebased on latest main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Describe fails when schema has FixedBinarySize columns

2 participants