Skip to content

feat: Add FileMetadata return to Writer::close() and introduce WriterConfig constants#17509

Closed
mohsaka wants to merge 2 commits into
facebookincubator:mainfrom
mohsaka:refactor
Closed

feat: Add FileMetadata return to Writer::close() and introduce WriterConfig constants#17509
mohsaka wants to merge 2 commits into
facebookincubator:mainfrom
mohsaka:refactor

Conversation

@mohsaka
Copy link
Copy Markdown
Collaborator

@mohsaka mohsaka commented May 13, 2026

This PR makes two improvements to the Velox writer infrastructure:

1. Return FileMetadata from Writer::close()

  • Modified Writer::close() to return std::unique_ptr<FileMetadata> instead of void
  • Added FileMetadata base class and format-specific implementations (ParquetFileMetadata, TextFileMetadata)
  • Enables callers to access file-level statistics and metadata after writing
  • Returns nullptr for empty files

2. Add WriterConfig constants

  • Created new WriterConfig.h header with Parquet writer configuration constants
  • Allows external projects (e.g., Gluten) to access config constants without Arrow dependencies
  • Updated all test references to use new constants

All existing tests updated and passing. Prep-PR for #17388.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 13, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 97f44cd
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/6a05d73cb00f0b00081a5a65

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Build Impact Analysis

Selective Build Targets (building these covers all 297 affected)

cmake --build _build/release --target aggregate_companion_functions_test physical_size_aggregator_test presto_sql_test spark_aggregation_fuzzer_test spark_expression_fuzzer_test velox_abfs_test velox_aggregates_GeometryAggregateTest velox_aggregates_reduce_agg_bm velox_aggregates_simple_aggregates_bm velox_aggregates_string_keys_bm velox_aggregates_test_group0 velox_aggregates_test_group1 velox_aggregates_test_group2 velox_aggregates_test_group3 velox_aggregates_test_group4 velox_aggregation_fuzzer_test velox_aggregation_runner_test velox_benchmark_array_writer_no_nulls velox_benchmark_array_writer_with_nulls velox_benchmark_map_writer_no_nulls velox_benchmark_map_writer_with_nulls velox_benchmark_nested_array_writer_no_nulls velox_benchmark_nested_array_writer_with_nulls velox_cache_fuzzer velox_common_compression_test velox_common_test velox_core_test velox_driver_test velox_duckdb_conversion_test velox_dwio_arrow_parquet_writer_test velox_dwio_cache_test velox_dwio_common_bitpack_decoder_benchmark velox_dwio_common_data_buffer_benchmark velox_dwio_common_int_decoder_benchmark velox_dwio_common_test velox_dwio_dwrf_buffered_output_stream_test velox_dwio_dwrf_byte_rle_encoder_test velox_dwio_dwrf_byte_rle_test velox_dwio_dwrf_checksum_test velox_dwio_dwrf_column_reader_test velox_dwio_dwrf_column_statistics_test velox_dwio_dwrf_compression_test velox_dwio_dwrf_config_test velox_dwio_dwrf_data_buffer_holder_test velox_dwio_dwrf_decompression_test velox_dwio_dwrf_decryption_test velox_dwio_dwrf_dictionary_encoder_test velox_dwio_dwrf_dictionary_encoding_utils_test velox_dwio_dwrf_encoding_selector_test velox_dwio_dwrf_encryption_test velox_dwio_dwrf_flush_policy_test velox_dwio_dwrf_index_builder_test velox_dwio_dwrf_int_direct_test velox_dwio_dwrf_int_encoder_test velox_dwio_dwrf_layout_planner_test velox_dwio_dwrf_ratio_checker_test velox_dwio_dwrf_reader_base_test velox_dwio_dwrf_reader_test velox_dwio_dwrf_rle_test velox_dwio_dwrf_rlev1_encoder_test velox_dwio_dwrf_stream_labels_test velox_dwio_dwrf_stripe_dictionary_cache_test velox_dwio_dwrf_stripe_reader_base_test velox_dwio_dwrf_stripe_stream_test velox_dwio_dwrf_utils_test velox_dwio_dwrf_writer_context_test velox_dwio_dwrf_writer_encoding_manager_test velox_dwio_dwrf_writer_sink_test velox_dwio_dwrf_writer_test velox_dwio_iceberg_reader_benchmark velox_dwio_orc_column_statistics_test velox_dwio_orc_reader_filter_test velox_dwio_orc_reader_test velox_dwio_parquet_common_test velox_dwio_parquet_page_reader_test velox_dwio_parquet_reader_benchmark velox_dwio_parquet_reader_test velox_dwio_parquet_rlebp_decoder_test velox_dwio_parquet_structure_decoder_benchmark velox_dwio_parquet_structure_decoder_test velox_dwio_parquet_table_scan_test velox_dwio_parquet_thrift_test velox_dwio_parquet_tpch_test velox_dwrf_column_writer_index_test velox_dwrf_column_writer_stats_test velox_dwrf_column_writer_test velox_dwrf_e2e_filter_test velox_dwrf_e2e_reader_test velox_dwrf_e2e_writer_test velox_dwrf_float_column_writer_benchmark velox_dwrf_int_encoder_benchmark velox_dwrf_statistics_builder_utils_test velox_dwrf_writer_extended_test velox_dwrf_writer_flush_test velox_example_operator_extensibility velox_example_scan_orc velox_exchange_benchmark velox_exchange_fuzzer velox_exec_SpatialJoinTest velox_exec_bm_duplicate_project velox_exec_infra_test velox_exec_test_group0 velox_exec_test_group1 velox_exec_test_group2 velox_exec_test_group3 velox_exec_test_group4 velox_exec_test_group5 velox_exec_test_group6 velox_exec_test_group7 velox_exec_util_test_group0 velox_expression_fuzzer_test velox_expression_fuzzer_unit_test velox_expression_runner_test velox_expression_runner_unit_test velox_expression_test velox_expression_verifier_unit_test velox_filemetadata_test velox_filter_project_benchmark velox_function_dynamic_link_test velox_function_registry_test velox_functions_aggregates_test velox_functions_benchmarks_compare velox_functions_benchmarks_row_writer_no_nulls velox_functions_benchmarks_simdjson_function_with_expr velox_functions_benchmarks_string_writer_no_nulls velox_functions_benchmarks_url velox_functions_iceberg_test velox_functions_lib_test velox_functions_prestosql_benchmarks_array_contains velox_functions_prestosql_benchmarks_array_min_max velox_functions_prestosql_benchmarks_array_position velox_functions_prestosql_benchmarks_array_sum velox_functions_prestosql_benchmarks_bitwise velox_functions_prestosql_benchmarks_cardinality velox_functions_prestosql_benchmarks_comparisons velox_functions_prestosql_benchmarks_concat velox_functions_prestosql_benchmarks_date_time velox_functions_prestosql_benchmarks_field_reference velox_functions_prestosql_benchmarks_generic velox_functions_prestosql_benchmarks_in velox_functions_prestosql_benchmarks_map_concat velox_functions_prestosql_benchmarks_map_except velox_functions_prestosql_benchmarks_map_input velox_functions_prestosql_benchmarks_map_intersect velox_functions_prestosql_benchmarks_map_subscript velox_functions_prestosql_benchmarks_map_zip_with velox_functions_prestosql_benchmarks_not velox_functions_prestosql_benchmarks_regexp_replace velox_functions_prestosql_benchmarks_row velox_functions_prestosql_benchmarks_string_ascii_utf_functions velox_functions_prestosql_benchmarks_uuid_cast velox_functions_prestosql_benchmarks_width_bucket velox_functions_prestosql_benchmarks_zip velox_functions_prestosql_benchmarks_zip_with velox_functions_spark_aggregates_test velox_functions_spark_test velox_functions_test velox_fuzzer_connector_test velox_gcs_file_test velox_gcs_insert_test velox_gcs_multiendpoints_test velox_gcsfile_example velox_hash_benchmark velox_hash_join_build_benchmark velox_hash_join_list_result_benchmark velox_hash_join_prepare_join_table_benchmark velox_hdfs_file_test velox_hdfs_insert_test velox_hive_connector_test velox_hive_iceberg_deletion_vector_test velox_hive_iceberg_deletion_vector_writer_test velox_hive_iceberg_dwrf_insert_test velox_hive_iceberg_equality_delete_test velox_hive_iceberg_insert_test velox_hive_iceberg_test velox_hive_paimon_connector velox_hive_paimon_data_file_meta_test velox_hive_paimon_deletion_file_test velox_hive_paimon_row_kind_test velox_hive_paimon_split_test velox_hive_partition_function_benchmark velox_hive_writer_options_adapter_test velox_in_10_min_demo velox_join_fuzzer velox_key_encoder_test velox_mark_distinct_fuzzer velox_mark_sorted_benchmark velox_memory_arbitration_fuzzer velox_memory_test velox_orderby_benchmark velox_parquet_e2e_filter_test velox_parquet_writer_sink_test velox_parquet_writer_test velox_presto_types_fuzzer_utils_test velox_query_replayer velox_re2_functions_benchmarks velox_read_benchmark velox_row_number_fuzzer velox_rpc_operator_test velox_s3config_test velox_s3file_test velox_s3finalize_test velox_s3insert_test velox_s3metrics_test velox_s3multiendpoints_test velox_s3read_test velox_s3registration_test velox_serializer_test_group0 velox_simple_aggregate_test velox_sort_benchmark velox_spark_query_runner_test velox_spark_windows_test velox_sparksql_benchmarks_from_json velox_sparksql_benchmarks_get_funcs velox_sparksql_benchmarks_in velox_spatial_join_benchmark velox_spatial_join_fuzzer velox_spiller_aggregate_benchmark velox_spiller_join_benchmark velox_streaming_aggregation_benchmark velox_table_evolution_fuzzer_test velox_text_reader_test velox_text_writer_test velox_tool_trace_test velox_topn_row_number_fuzzer velox_tpcds_benchmark velox_tpcds_connector_test velox_tpch_benchmark velox_tpch_connector_test velox_tpch_speed_test velox_trace_file_tool velox_wave_benchmark velox_wave_exec_test velox_window_fuzzer_test velox_window_prefixsort_benchmark velox_window_sub_partitioned_sort_benchmark velox_windows_agg_test velox_windows_rank_test velox_windows_value_test velox_writer_fuzzer_test

Total affected: 297/575 targets

Warning: 2 file(s) could not be mapped to any target. A full build may be needed.

  • velox/dwio/common/CMakeLists.txt
  • velox/dwio/parquet/writer/CMakeLists.txt
Affected targets (297)

Directly changed (124)

Target Changed Files
spark_aggregation_fuzzer_test FileMetadata.h, Writer.h
velox_aggregates_GeometryAggregateTest FileMetadata.h, Writer.h
velox_aggregates_reduce_agg_bm FileMetadata.h, Writer.h
velox_aggregates_simple_aggregates_bm FileMetadata.h, Writer.h
velox_aggregates_string_keys_bm FileMetadata.h, Writer.h
velox_aggregates_test_group0 FileMetadata.h, Writer.h
velox_aggregates_test_group1 FileMetadata.h, Writer.h
velox_aggregates_test_group2 FileMetadata.h, Writer.h
velox_aggregates_test_group3 FileMetadata.h, Writer.h
velox_aggregates_test_group4 FileMetadata.h, Writer.h
velox_aggregation_fuzzer FileMetadata.h, Writer.h
velox_aggregation_fuzzer_base FileMetadata.h, Writer.h
velox_aggregation_fuzzer_test FileMetadata.h, Writer.h
velox_aggregation_result_verifier FileMetadata.h, Writer.h
velox_core_test FileMetadata.h, Writer.h
velox_driver_test FileMetadata.h, Writer.h
velox_dwio_arrow_parquet_writer FileMetadata.h, Writer.cpp, Writer.h, WriterConfig.h
velox_dwio_common FileMetadata.h, SortingWriter.cpp, SortingWriter.h, Writer.h
velox_dwio_common_test FileMetadata.h, SortingWriter.h, SortingWriterTest.cpp, Writer.h, WriterTest.cpp
velox_dwio_dwrf_config_test FileMetadata.h, Writer.h
velox_dwio_dwrf_reader_test FileMetadata.h, Writer.h
velox_dwio_dwrf_writer FileMetadata.h, Writer.cpp, Writer.h
velox_dwio_dwrf_writer_encoding_manager_test FileMetadata.h, Writer.h
velox_dwio_iceberg_reader_benchmark FileMetadata.h, Writer.h
velox_dwio_iceberg_reader_benchmark_lib FileMetadata.h, Writer.h
velox_dwio_orc_reader_filter_test FileMetadata.h, Writer.h
velox_dwio_parquet_page_reader_test FileMetadata.h, Writer.h, WriterConfig.h
velox_dwio_parquet_reader_benchmark FileMetadata.h, Writer.h, WriterConfig.h
velox_dwio_parquet_reader_benchmark_lib FileMetadata.h, Writer.h, WriterConfig.h
velox_dwio_parquet_reader_test FileMetadata.h, Writer.h, WriterConfig.h
velox_dwio_parquet_table_scan_test FileMetadata.h, Writer.h, WriterConfig.h
velox_dwio_parquet_tpch_test FileMetadata.h, Writer.h
velox_dwio_parquet_writer FileMetadata.h, Writer.h, WriterConfig.h
velox_dwio_text_writer FileMetadata.h, TextWriter.cpp, TextWriter.h, Writer.h
velox_dwio_text_writer_register FileMetadata.h, TextWriter.h, Writer.h
velox_dwrf_column_writer_stats_test FileMetadata.h, Writer.h
velox_dwrf_column_writer_test FileMetadata.h, Writer.h
velox_dwrf_e2e_filter_test FileMetadata.h, Writer.h
velox_dwrf_e2e_reader_test FileMetadata.h, Writer.h
velox_dwrf_e2e_writer_test FileMetadata.h, Writer.h
velox_dwrf_float_column_writer_benchmark FileMetadata.h, Writer.h
velox_dwrf_test_utils FileMetadata.h, Writer.h
velox_dwrf_writer_extended_test FileMetadata.h, Writer.h
velox_dwrf_writer_flush_test FileMetadata.h, Writer.h
velox_example_operator_extensibility FileMetadata.h, Writer.h
velox_exchange_benchmark FileMetadata.h, Writer.h
velox_exchange_fuzzer FileMetadata.h, Writer.h
velox_exec_SpatialJoinTest FileMetadata.h, Writer.h
velox_exec_bm_duplicate_project FileMetadata.h, Writer.h
velox_exec_infra_test FileMetadata.h, Writer.h
velox_exec_test_group0 FileMetadata.h, Writer.h
velox_exec_test_group1 FileMetadata.h, Writer.h
velox_exec_test_group2 FileMetadata.h, Writer.h
velox_exec_test_group3 FileMetadata.h, Writer.h
velox_exec_test_group4 FileMetadata.h, Writer.h
velox_exec_test_group5 FileMetadata.h, Writer.h
velox_exec_test_group6 FileMetadata.h, Writer.h
velox_exec_test_group7 FileMetadata.h, Writer.h
velox_exec_test_lib FileMetadata.h, Writer.h
velox_exec_util_test_group0 FileMetadata.h, Writer.h
velox_expression_test FileMetadata.h, Writer.h
velox_filter_project_benchmark FileMetadata.h, Writer.h
velox_functions_aggregates_test_lib FileMetadata.h, Writer.h
velox_functions_spark_aggregates_test FileMetadata.h, Writer.h
velox_functions_window_test_lib FileMetadata.h, Writer.h
velox_fuzzer_connector_test FileMetadata.h, Writer.h
velox_fuzzer_util FileMetadata.h, Writer.h
velox_gcs_insert_test FileMetadata.h, Writer.h
velox_gcs_multiendpoints_test FileMetadata.h, Writer.h
velox_hash_join_build_benchmark FileMetadata.h, Writer.h
velox_hdfs_insert_test FileMetadata.h, Writer.h
velox_hive_connector FileMetadata.h, SortingWriter.h, Writer.h
velox_hive_connector_test FileMetadata.h, Writer.h, WriterConfig.h
velox_hive_iceberg_dwrf_insert_test FileMetadata.h, Writer.h
velox_hive_iceberg_equality_delete_test FileMetadata.h, Writer.h
velox_hive_iceberg_insert_test FileMetadata.h, Writer.h
velox_hive_iceberg_splitreader FileMetadata.h, Writer.h
velox_hive_iceberg_test FileMetadata.h, Writer.h
velox_in_10_min_demo FileMetadata.h, Writer.h
velox_join_fuzzer FileMetadata.h, Writer.h
velox_key_encoder_test FileMetadata.h, Writer.h
velox_mark_distinct_fuzzer_lib FileMetadata.h, Writer.h
velox_mark_sorted_benchmark FileMetadata.h, Writer.h
velox_memory_arbitration_fuzzer FileMetadata.h, Writer.h
velox_memory_test FileMetadata.h, Writer.h
velox_orderby_benchmark FileMetadata.h, Writer.h
velox_parquet_e2e_filter_test FileMetadata.h, Writer.h, WriterConfig.h
velox_parquet_writer_sink_test FileMetadata.h, Writer.h, WriterConfig.h
velox_parquet_writer_test FileMetadata.h, ParquetWriterTest.cpp, Writer.h, WriterConfig.h
velox_query_benchmark FileMetadata.h, Writer.h
velox_query_trace_replayer_base FileMetadata.h, Writer.h
velox_row_number_fuzzer_lib FileMetadata.h, Writer.h
velox_rpc_operator_test FileMetadata.h, Writer.h
velox_s3file_test FileMetadata.h, Writer.h
velox_s3insert_test FileMetadata.h, Writer.h
velox_s3metrics_test FileMetadata.h, Writer.h
velox_s3multiendpoints_test FileMetadata.h, Writer.h
velox_s3read_test FileMetadata.h, Writer.h
velox_s3registration_test FileMetadata.h, Writer.h
velox_simple_aggregate_test FileMetadata.h, Writer.h
velox_spark_query_runner FileMetadata.h, Writer.h, WriterConfig.h
velox_spark_query_runner_test FileMetadata.h, Writer.h
velox_spark_windows_test FileMetadata.h, Writer.h
velox_spatial_join_benchmark FileMetadata.h, Writer.h
velox_spatial_join_fuzzer FileMetadata.h, Writer.h
velox_streaming_aggregation_benchmark FileMetadata.h, Writer.h
velox_table_evolution_fuzzer_test FileMetadata.h, Writer.h
velox_text_writer_test FileMetadata.h, TextWriter.h, Writer.h
velox_tool_trace_test FileMetadata.h, Writer.h
velox_topn_row_number_fuzzer_lib FileMetadata.h, Writer.h
velox_tpcds_benchmark FileMetadata.h, Writer.h
velox_tpcds_benchmark_lib FileMetadata.h, Writer.h
velox_tpcds_connector_test FileMetadata.h, Writer.h
velox_tpch_benchmark FileMetadata.h, Writer.h
velox_tpch_benchmark_lib FileMetadata.h, Writer.h
velox_tpch_connector_test FileMetadata.h, Writer.h
velox_tpch_speed_test FileMetadata.h, Writer.h
velox_wave_benchmark FileMetadata.h, Writer.h, WriterConfig.h
velox_wave_exec_test FileMetadata.h, Writer.h
velox_window_fuzzer FileMetadata.h, Writer.h
velox_window_fuzzer_test FileMetadata.h, Writer.h
velox_window_prefixsort_benchmark FileMetadata.h, Writer.h
velox_window_sub_partitioned_sort_benchmark FileMetadata.h, Writer.h
velox_writer_fuzzer FileMetadata.h, Writer.h

Transitively affected (173)

  • aggregate_companion_functions_test
  • physical_size_aggregator_test
  • presto_sql_test
  • spark_expression_fuzzer_test
  • velox_abfs_test
  • velox_aggregation_runner_test
  • velox_benchmark_array_writer_no_nulls
  • velox_benchmark_array_writer_with_nulls
  • velox_benchmark_map_writer_no_nulls
  • velox_benchmark_map_writer_with_nulls
  • velox_benchmark_nested_array_writer_no_nulls
  • velox_benchmark_nested_array_writer_with_nulls
  • velox_cache_fuzzer
  • velox_cache_fuzzer_lib
  • velox_common_compression_test
  • velox_common_test
  • velox_duckdb_conversion_test
  • velox_dwio_arrow_parquet_writer_lib
  • velox_dwio_arrow_parquet_writer_test
  • velox_dwio_arrow_parquet_writer_test_lib
  • velox_dwio_arrow_parquet_writer_util_lib
  • velox_dwio_cache_test
  • velox_dwio_common_bitpack_decoder_benchmark
  • velox_dwio_common_compression
  • velox_dwio_common_data_buffer_benchmark
  • velox_dwio_common_int_decoder_benchmark
  • velox_dwio_common_test_utils
  • velox_dwio_dwrf_buffered_output_stream_test
  • velox_dwio_dwrf_byte_rle_encoder_test
  • velox_dwio_dwrf_byte_rle_test
  • velox_dwio_dwrf_checksum_test
  • velox_dwio_dwrf_column_reader_test
  • velox_dwio_dwrf_column_statistics_test
  • velox_dwio_dwrf_common
  • velox_dwio_dwrf_compression_test
  • velox_dwio_dwrf_data_buffer_holder_test
  • velox_dwio_dwrf_decompression_test
  • velox_dwio_dwrf_decryption_test
  • velox_dwio_dwrf_dictionary_encoder_test
  • velox_dwio_dwrf_dictionary_encoding_utils_test
  • velox_dwio_dwrf_encoding_selector_test
  • velox_dwio_dwrf_encryption_test
  • velox_dwio_dwrf_flush_policy_test
  • velox_dwio_dwrf_index_builder_test
  • velox_dwio_dwrf_int_direct_test
  • velox_dwio_dwrf_int_encoder_test
  • velox_dwio_dwrf_layout_planner_test
  • velox_dwio_dwrf_ratio_checker_test
  • velox_dwio_dwrf_reader
  • velox_dwio_dwrf_reader_base_test
  • velox_dwio_dwrf_rle_test
  • velox_dwio_dwrf_rlev1_encoder_test
  • velox_dwio_dwrf_stream_labels_test
  • velox_dwio_dwrf_stripe_dictionary_cache_test
  • velox_dwio_dwrf_stripe_reader_base_test
  • velox_dwio_dwrf_stripe_stream_test
  • velox_dwio_dwrf_utils
  • velox_dwio_dwrf_utils_test
  • velox_dwio_dwrf_writer_context_test
  • velox_dwio_dwrf_writer_sink_test
  • velox_dwio_dwrf_writer_test
  • velox_dwio_faulty_file_sink
  • velox_dwio_native_parquet_reader
  • velox_dwio_orc_column_statistics_test
  • velox_dwio_orc_reader
  • velox_dwio_orc_reader_test
  • velox_dwio_parquet_common
  • velox_dwio_parquet_common_test
  • velox_dwio_parquet_reader
  • velox_dwio_parquet_rlebp_decoder_test
  • velox_dwio_parquet_structure_decoder_benchmark
  • velox_dwio_parquet_structure_decoder_test
  • velox_dwio_parquet_thrift_test
  • velox_dwio_text_reader
  • velox_dwio_text_reader_register
  • velox_dwrf_column_writer_index_test
  • velox_dwrf_int_encoder_benchmark
  • velox_dwrf_statistics_builder_utils_test
  • velox_example_scan_orc
  • velox_expression_fuzzer
  • velox_expression_fuzzer_test
  • velox_expression_fuzzer_unit_test
  • velox_expression_runner
  • velox_expression_runner_test
  • velox_expression_runner_unit_test
  • velox_expression_test_utility
  • velox_expression_verifier
  • velox_expression_verifier_unit_test
  • velox_filemetadata_test
  • velox_function_dynamic_link_test
  • velox_function_registry_test
  • velox_functions_aggregates_test
  • velox_functions_benchmarks_compare
  • velox_functions_benchmarks_row_writer_no_nulls
  • velox_functions_benchmarks_simdjson_function_with_expr
  • velox_functions_benchmarks_string_writer_no_nulls
  • velox_functions_benchmarks_url
  • velox_functions_iceberg_test
  • velox_functions_lib_test
  • velox_functions_prestosql_benchmarks_array_contains
  • velox_functions_prestosql_benchmarks_array_min_max
  • velox_functions_prestosql_benchmarks_array_position
  • velox_functions_prestosql_benchmarks_array_sum
  • velox_functions_prestosql_benchmarks_bitwise
  • velox_functions_prestosql_benchmarks_cardinality
  • velox_functions_prestosql_benchmarks_comparisons
  • velox_functions_prestosql_benchmarks_concat
  • velox_functions_prestosql_benchmarks_date_time
  • velox_functions_prestosql_benchmarks_field_reference
  • velox_functions_prestosql_benchmarks_generic
  • velox_functions_prestosql_benchmarks_in
  • velox_functions_prestosql_benchmarks_map_concat
  • velox_functions_prestosql_benchmarks_map_except
  • velox_functions_prestosql_benchmarks_map_input
  • velox_functions_prestosql_benchmarks_map_intersect
  • velox_functions_prestosql_benchmarks_map_subscript
  • velox_functions_prestosql_benchmarks_map_zip_with
  • velox_functions_prestosql_benchmarks_not
  • velox_functions_prestosql_benchmarks_regexp_replace
  • velox_functions_prestosql_benchmarks_row
  • velox_functions_prestosql_benchmarks_string_ascii_utf_functions
  • velox_functions_prestosql_benchmarks_uuid_cast
  • velox_functions_prestosql_benchmarks_width_bucket
  • velox_functions_prestosql_benchmarks_zip
  • velox_functions_prestosql_benchmarks_zip_with
  • velox_functions_spark_test
  • velox_functions_test
  • velox_functions_test_lib
  • velox_gcs
  • velox_gcs_file_test
  • velox_gcsfile_example
  • velox_hash_benchmark
  • velox_hash_join_list_result_benchmark
  • velox_hash_join_prepare_join_table_benchmark
  • velox_hdfs
  • velox_hdfs_file_test
  • velox_hive_iceberg_deletion_vector_test
  • velox_hive_iceberg_deletion_vector_writer_test
  • velox_hive_paimon_connector
  • velox_hive_paimon_data_file_meta_test
  • velox_hive_paimon_deletion_file_test
  • velox_hive_paimon_row_kind_test
  • velox_hive_paimon_split
  • velox_hive_paimon_split_test
  • velox_hive_partition_function_benchmark
  • velox_hive_writer_options_adapter_test
  • velox_mark_distinct_fuzzer
  • velox_presto_types_fuzzer_utils_test
  • velox_query_replayer
  • velox_re2_functions_benchmarks
  • velox_read_benchmark
  • velox_row_number_fuzzer
  • velox_s3config_test
  • velox_s3finalize_test
  • velox_s3fs
  • velox_serializer_test_group0
  • velox_sort_benchmark
  • velox_sparksql_benchmarks_from_json
  • velox_sparksql_benchmarks_get_funcs
  • velox_sparksql_benchmarks_in
  • velox_spill_fuzzer_base_lib
  • velox_spiller_aggregate_benchmark
  • velox_spiller_aggregate_benchmark_base
  • velox_spiller_join_benchmark
  • velox_spiller_join_benchmark_base
  • velox_text_reader_test
  • velox_topn_row_number_fuzzer
  • velox_trace_file_tool
  • velox_trace_file_tool_base
  • velox_windows_agg_test
  • velox_windows_rank_test
  • velox_windows_value_test
  • velox_writer_fuzzer_test

Slow path • Graph generated from PR branch

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

CI Failure Analysis

Auto-generated by the CI Failure Analysis workflow. This comment is updated in place each time CI fails on a new commit, so it always reflects the latest run — re-pushing or re-running CI will refresh the analysis below. Last updated 2026-05-14 14:34:13 UTC from workflow run 25864695207.

🟡 Join Fuzzer — FUZZER Failure View logs

Fuzzer failure: Instance 3 (seed=34921271) — Velox and Reference results don't match

File: velox/exec/fuzzer/JoinFuzzer.cpp:740
Function: verify

QueryAssertions.cpp:1119: Failure
Expected 1000, got 1000
1 extra rows, 1 missing rows

1 of extra rows:
  1580880306 | 0.9967597723007202 | null | "1918-08-12T00:51:39.105000000" |
  6062672166957365092 | [...] | false

1 of missing rows:
  1580880306 | 0.9967597723007202 | null | "1918-08-12T00:51:39.105000000" |
  6062672181702965092 | [...] | false

Note: The 5th column differs: 6062672166957365092 vs 6062672181702965092

Instances 1, 2, 4 passed. Only instance 3 failed.


🟡 Expression Fuzzer with Presto SOT — FUZZER Failure View logs

Fuzzer failure: All 4 instances failed — Velox and reference DB results don't match

Instance Seed Error
1 13439517 3 extra/missing rows (ExpressionVerifier.cpp:475)
2 31414603 1 extra/missing row (ExpressionVerifier.cpp:475)
3 422933013 1 extra/missing row (ExpressionVerifier.cpp:475)
4 644650463 1 extra/missing row (ExpressionVerifier.cpp:475)
File: velox/expression/tests/ExpressionVerifier.cpp:475
Function: verify
Error: Velox and reference DB results don't match

Instance 1 (seed=13439517):
  Expected 100, got 100 — 3 extra rows, 3 missing rows

Instance 3 (seed=422933013):
  Expected 63, got 63 — 1 extra rows, 1 missing rows

All instances show row-count matches but individual row value differences,
consistent with a numeric precision/data mismatch rather than logic error.

🟡 Window Fuzzer with Presto as source of truth — FUZZER Failure View logs

Fuzzer failure: Instances 2 and 3 failed — Velox and reference DB results don't match

Instance Seed Error
2 777177270 4 extra/missing rows — var_samp window function (WindowFuzzer.cpp:802)
3 256386633 12 extra/missing rows — covar_pop window function (WindowFuzzer.cpp:802)

Instances 1 and 4 passed.

File: velox/exec/fuzzer/WindowFuzzer.cpp:802
Function: verifyWindow
Error: Velox and reference DB results don't match

Instance 2 (seed=777177270):
  Expected 500, got 500 — 4 extra rows, 4 missing rows
  Plan: var_samp(c0) RANGE between UNBOUNDED PRECEDING and off1 FOLLOWING

Instance 3 (seed=256386633):
  Expected 500, got 500 — 12 extra rows, 12 missing rows
  Plan: covar_pop(c0, c1) ROWS between 3 FOLLOWING and k1 FOLLOWING

Correlation with PR changes:

The PR (#17509) modifies DWIO writer infrastructure — specifically the Writer::close() return type (changing from void to std::unique_ptr<FileMetadata>), adds a FileMetadata base class, creates a WriterConfig header for Parquet config constants, and updates the Parquet/DWRF/Text writers accordingly. None of these changes touch the execution engine, join logic, expression evaluation, or window functions. The fuzzer failures are in completely unrelated code paths (HashJoinBridge, ExpressionVerifier, WindowFuzzer).

Known issues:

  • The exact same three fuzzer jobs (Join Fuzzer, Expression Fuzzer with Presto SOT, Window Fuzzer with Presto SOT) also fail on the main branch — the most recent main branch run (25841349965) shows identical failures for all three jobs plus the Aggregation Fuzzer.
  • These are pre-existing/flaky failures, not caused by this PR.
  • Related open issues:
    • #9376 — Join Fuzzer: left join returns incorrect data with spilling
    • #13242 — Join fuzzer failure
    • #4350 — Window Variance Fuzzer Failure
    • #16917 — Flaky Window Fuzzer: verification rate drops below 50%

Reproduce locally:

# Join Fuzzer (instance 3)
./velox_join_fuzzer --seed 34921271 --duration_sec 300 --presto_url=http://127.0.0.1:8080

# Expression Fuzzer (instance 1)
./velox_expression_fuzzer_test --seed 13439517 --duration_sec 300 --presto_url=http://127.0.0.1:8080

# Window Fuzzer (instance 2)
./velox_window_fuzzer_test --seed 777177270 --duration_sec 300 --presto_url=http://127.0.0.1:8080

Note: Reproducing requires a running Presto server (used as source of truth).

Recommended fix: No fix needed for this PR — all three failures are pre-existing flaky fuzzer failures that also reproduce on main. The PR changes (DWIO writer metadata) are unrelated to the failing code paths.

@mohsaka mohsaka marked this pull request as ready for review May 14, 2026 02:39
@mohsaka mohsaka requested a review from majetideepak as a code owner May 14, 2026 02:39
@mohsaka mohsaka requested a review from aditi-pandit May 14, 2026 02:39
@mbasmanova mbasmanova requested a review from kgpai May 14, 2026 05:24
Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka, Thank you for splitting this out!

This addresses what we asked for in #17388: no multiple inheritance, constants accessed as WriterConfig::kFoo, and the comment in WriterConfig.h explaining why a separate header is needed.

Duplicated constants. WriterConfig introduces new constants (e.g., kParquetSessionEnableDictionary) that duplicate existing ones in WriterOptions (e.g., kParquetEnableDictionary) with the same string values but different names. Production code (Writer.cpp) still uses the WriterOptions names. The constants should live in one place — move them out of WriterOptions into WriterConfig, and have WriterOptions reference WriterConfig. Otherwise we end up with two sets of constants that can drift.

Nits in WriterConfig.h:

  • /// Lightweight config constants for the Parquet writer. — drop "Lightweight".
  • /// Usage: Reference constants as WriterConfig::kParquetSessionWriteTimestampUnit — drop, this is obvious.
  • in a separate header (WriterConfig.h) — drop (WriterConfig.h), the reader is already in that file.

@mohsaka
Copy link
Copy Markdown
Collaborator Author

mohsaka commented May 14, 2026

@mohsaka, Thank you for splitting this out!

This addresses what we asked for in #17388: no multiple inheritance, constants accessed as WriterConfig::kFoo, and the comment in WriterConfig.h explaining why a separate header is needed.

Duplicated constants. WriterConfig introduces new constants (e.g., kParquetSessionEnableDictionary) that duplicate existing ones in WriterOptions (e.g., kParquetEnableDictionary) with the same string values but different names. Production code (Writer.cpp) still uses the WriterOptions names. The constants should live in one place — move them out of WriterOptions into WriterConfig, and have WriterOptions reference WriterConfig. Otherwise we end up with two sets of constants that can drift.

Nits in WriterConfig.h:

  • /// Lightweight config constants for the Parquet writer. — drop "Lightweight".
  • /// Usage: Reference constants as WriterConfig::kParquetSessionWriteTimestampUnit — drop, this is obvious.
  • in a separate header (WriterConfig.h) — drop (WriterConfig.h), the reader is already in that file.

@mbasmanova As always, thanks for the very quick responses and reviews so we can get this in asap. I've added all of your comment suggestions.

I've also changed the constants in WriterOptions to reference those in WriterConfig. I believe this is what you meant. But if I misunderstood please let me know.

Thanks!

@mohsaka mohsaka requested a review from mbasmanova May 14, 2026 06:03
Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick update! One more thing: please remove the constant aliases from WriterOptions entirely and use WriterConfig::kParquet* directly in Writer.cpp. There are ~10 unqualified references in processConfigs() — just qualify them. This eliminates the delegation layer and keeps the constants in one place.

Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates!

mohsaka and others added 2 commits May 14, 2026 07:07
Co-authored-by: Ping Liu <ping.liu.ping@gmail.com>
Co-authored-by: Krishna Pai <kpai@meta.com>
@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label May 14, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 14, 2026

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this in D105173381.

@mohsaka
Copy link
Copy Markdown
Collaborator Author

mohsaka commented May 14, 2026

Thank you for the updates!

@mbasmanova Wow that was fast! I pushed a rebase afterwards, but I don't think it makes a difference for import if its commit based. Thank you!

@aditi-pandit
Copy link
Copy Markdown
Collaborator

Thanks @mbasmanova and @mohsaka

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 14, 2026

@mbasmanova merged this pull request in 6800d5b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants