Skip to content

refactor: extract sort pushdown logic from FileScanConfig into separate module#21457

Open
zhuqi-lucas wants to merge 2 commits intoapache:mainfrom
zhuqi-lucas:feat/refactor-sort-pushdown-module
Open

refactor: extract sort pushdown logic from FileScanConfig into separate module#21457
zhuqi-lucas wants to merge 2 commits intoapache:mainfrom
zhuqi-lucas:feat/refactor-sort-pushdown-module

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21433

Rationale for this change

As noted by @alamb in #21182 (comment), file_scan_config.rs has grown large after the sort pushdown optimization. This PR extracts the sort pushdown helpers into their own module to improve readability and maintainability.

What changes are included in this PR?

Move sort pushdown logic from file_scan_config.rs (3591 → 3066 lines) into a new sort_pushdown.rs module (576 lines):

  • rebuild_with_source, try_sort_file_groups_by_statistics
  • sort_files_within_groups_by_statistics, any_file_has_nulls_in_sort_columns
  • validate_orderings, is_ordering_valid_for_file_groups
  • get_projected_output_ordering, ordered_column_indices_from_projection
  • SortedFileGroups struct

try_pushdown_sort stays in the DataSource impl — it calls into the new module.

Are these changes tested?

Pure refactor, all existing tests pass (120 passed).

Are there any user-facing changes?

No.

…te module

Move statistics-based file sorting, non-overlapping validation, and NULL
handling logic into `datasource/src/sort_pushdown.rs` to reduce the size
of `file_scan_config.rs` (3591 → 3066 lines).

Moved to sort_pushdown module:
- rebuild_with_source, try_sort_file_groups_by_statistics
- sort_files_within_groups_by_statistics, any_file_has_nulls_in_sort_columns
- validate_orderings, is_ordering_valid_for_file_groups
- get_projected_output_ordering, ordered_column_indices_from_projection

Pure refactor — no behavior changes.

Closes apache#21433
Copilot AI review requested due to automatic review settings April 8, 2026 08:52
@github-actions github-actions bot added the datasource Changes to the datasource crate label Apr 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors DataFusion’s file-based sort pushdown implementation by extracting statistics-based file sorting and ordering validation helpers out of FileScanConfig into a dedicated sort_pushdown module, reducing file_scan_config.rs size and improving maintainability.

Changes:

  • Introduces datafusion/datasource/src/sort_pushdown.rs containing sort pushdown helpers (file-group sorting, ordering validation, NULL/statistics checks).
  • Wires the new module into the crate (mod.rs) and updates FileScanConfig to call into crate::sort_pushdown::*.
  • Removes the extracted helper implementations from file_scan_config.rs while keeping try_pushdown_sort in the DataSource impl.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
datafusion/datasource/src/sort_pushdown.rs New module containing extracted sort pushdown helper logic and documentation.
datafusion/datasource/src/mod.rs Registers the new internal sort_pushdown module.
datafusion/datasource/src/file_scan_config.rs Updates call sites to use crate::sort_pushdown and removes inlined helper code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhuqi-lucas zhuqi-lucas requested review from adriangb and alamb April 8, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: extract sort pushdown logic from FileScanConfig into separate module

2 participants