feat: Add FileMetadata return to Writer::close() and introduce WriterConfig constants#17509
feat: Add FileMetadata return to Writer::close() and introduce WriterConfig constants#17509mohsaka wants to merge 2 commits into
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Build Impact AnalysisSelective Build Targets (building these covers all 297 affected)Total affected: 297/575 targets
Affected targets (297)Directly changed (124)
Transitively affected (173)
Slow path • Graph generated from PR branch |
CI Failure Analysis
🟡 Join Fuzzer — FUZZER Failure View logsFuzzer failure: Instance 3 (seed=34921271) — Velox and Reference results don't match Instances 1, 2, 4 passed. Only instance 3 failed. 🟡 Expression Fuzzer with Presto SOT — FUZZER Failure View logsFuzzer failure: All 4 instances failed — Velox and reference DB results don't match
🟡 Window Fuzzer with Presto as source of truth — FUZZER Failure View logsFuzzer failure: Instances 2 and 3 failed — Velox and reference DB results don't match
Instances 1 and 4 passed. Correlation with PR changes: The PR (#17509) modifies DWIO writer infrastructure — specifically the Known issues:
Reproduce locally: # Join Fuzzer (instance 3)
./velox_join_fuzzer --seed 34921271 --duration_sec 300 --presto_url=http://127.0.0.1:8080
# Expression Fuzzer (instance 1)
./velox_expression_fuzzer_test --seed 13439517 --duration_sec 300 --presto_url=http://127.0.0.1:8080
# Window Fuzzer (instance 2)
./velox_window_fuzzer_test --seed 777177270 --duration_sec 300 --presto_url=http://127.0.0.1:8080Note: Reproducing requires a running Presto server (used as source of truth). Recommended fix: No fix needed for this PR — all three failures are pre-existing flaky fuzzer failures that also reproduce on |
mbasmanova
left a comment
There was a problem hiding this comment.
@mohsaka, Thank you for splitting this out!
This addresses what we asked for in #17388: no multiple inheritance, constants accessed as WriterConfig::kFoo, and the comment in WriterConfig.h explaining why a separate header is needed.
Duplicated constants. WriterConfig introduces new constants (e.g., kParquetSessionEnableDictionary) that duplicate existing ones in WriterOptions (e.g., kParquetEnableDictionary) with the same string values but different names. Production code (Writer.cpp) still uses the WriterOptions names. The constants should live in one place — move them out of WriterOptions into WriterConfig, and have WriterOptions reference WriterConfig. Otherwise we end up with two sets of constants that can drift.
Nits in WriterConfig.h:
/// Lightweight config constants for the Parquet writer.— drop "Lightweight"./// Usage: Reference constants as WriterConfig::kParquetSessionWriteTimestampUnit— drop, this is obvious.in a separate header (WriterConfig.h)— drop(WriterConfig.h), the reader is already in that file.
@mbasmanova As always, thanks for the very quick responses and reviews so we can get this in asap. I've added all of your comment suggestions. I've also changed the constants in WriterOptions to reference those in WriterConfig. I believe this is what you meant. But if I misunderstood please let me know. Thanks! |
mbasmanova
left a comment
There was a problem hiding this comment.
Thank you for the quick update! One more thing: please remove the constant aliases from WriterOptions entirely and use WriterConfig::kParquet* directly in Writer.cpp. There are ~10 unqualified references in processConfigs() — just qualify them. This eliminates the delegation layer and keeps the constants in one place.
mbasmanova
left a comment
There was a problem hiding this comment.
Thank you for the updates!
Co-authored-by: Ping Liu <ping.liu.ping@gmail.com>
Co-authored-by: Krishna Pai <kpai@meta.com>
|
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this in D105173381. |
@mbasmanova Wow that was fast! I pushed a rebase afterwards, but I don't think it makes a difference for import if its commit based. Thank you! |
|
Thanks @mbasmanova and @mohsaka |
|
@mbasmanova merged this pull request in 6800d5b. |
This PR makes two improvements to the Velox writer infrastructure:
1. Return FileMetadata from Writer::close()
Writer::close()to returnstd::unique_ptr<FileMetadata>instead of voidFileMetadatabase class and format-specific implementations (ParquetFileMetadata,TextFileMetadata)2. Add WriterConfig constants
WriterConfig.hheader with Parquet writer configuration constantsAll existing tests updated and passing. Prep-PR for #17388.