refactor: extract sort pushdown logic from FileScanConfig into separate module#21457
Open
zhuqi-lucas wants to merge 2 commits intoapache:mainfrom
Open
refactor: extract sort pushdown logic from FileScanConfig into separate module#21457zhuqi-lucas wants to merge 2 commits intoapache:mainfrom
zhuqi-lucas wants to merge 2 commits intoapache:mainfrom
Conversation
…te module Move statistics-based file sorting, non-overlapping validation, and NULL handling logic into `datasource/src/sort_pushdown.rs` to reduce the size of `file_scan_config.rs` (3591 → 3066 lines). Moved to sort_pushdown module: - rebuild_with_source, try_sort_file_groups_by_statistics - sort_files_within_groups_by_statistics, any_file_has_nulls_in_sort_columns - validate_orderings, is_ordering_valid_for_file_groups - get_projected_output_ordering, ordered_column_indices_from_projection Pure refactor — no behavior changes. Closes apache#21433
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors DataFusion’s file-based sort pushdown implementation by extracting statistics-based file sorting and ordering validation helpers out of FileScanConfig into a dedicated sort_pushdown module, reducing file_scan_config.rs size and improving maintainability.
Changes:
- Introduces
datafusion/datasource/src/sort_pushdown.rscontaining sort pushdown helpers (file-group sorting, ordering validation, NULL/statistics checks). - Wires the new module into the crate (
mod.rs) and updatesFileScanConfigto call intocrate::sort_pushdown::*. - Removes the extracted helper implementations from
file_scan_config.rswhile keepingtry_pushdown_sortin theDataSourceimpl.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
datafusion/datasource/src/sort_pushdown.rs |
New module containing extracted sort pushdown helper logic and documentation. |
datafusion/datasource/src/mod.rs |
Registers the new internal sort_pushdown module. |
datafusion/datasource/src/file_scan_config.rs |
Updates call sites to use crate::sort_pushdown and removes inlined helper code. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #21433
Rationale for this change
As noted by @alamb in #21182 (comment),
file_scan_config.rshas grown large after the sort pushdown optimization. This PR extracts the sort pushdown helpers into their own module to improve readability and maintainability.What changes are included in this PR?
Move sort pushdown logic from
file_scan_config.rs(3591 → 3066 lines) into a newsort_pushdown.rsmodule (576 lines):rebuild_with_source,try_sort_file_groups_by_statisticssort_files_within_groups_by_statistics,any_file_has_nulls_in_sort_columnsvalidate_orderings,is_ordering_valid_for_file_groupsget_projected_output_ordering,ordered_column_indices_from_projectionSortedFileGroupsstructtry_pushdown_sortstays in theDataSourceimpl — it calls into the new module.Are these changes tested?
Pure refactor, all existing tests pass (120 passed).
Are there any user-facing changes?
No.