Skip to content

Add support ANSI cast float/double to timestamp#17219

Open
Harthi7 wants to merge 5 commits intofacebookincubator:mainfrom
Harthi7:cast-float-double-to-timestamp
Open

Add support ANSI cast float/double to timestamp#17219
Harthi7 wants to merge 5 commits intofacebookincubator:mainfrom
Harthi7:cast-float-double-to-timestamp

Conversation

@Harthi7
Copy link
Copy Markdown

@Harthi7 Harthi7 commented Apr 16, 2026

Implemented ANSI support for cast float/double to timestamp.

Changes made:

  • Enabled ANSI routing for REAL/DOUBLE -> TIMESTAMP in SparkCastExpr.
  • Updated castDoubleToTimestamp so ANSI CAST throws on non-finite and overflow inputs, while non-ANSI behavior remains unchanged.
  • Added and split tests for ANSI ON and ANSI OFF behavior, including TRY_CAST coverage.

Validation:

  • Targeted float/double to timestamp tests passed under ANSI ON and ANSI OFF.
  • Full SparkCastExpr suite passed locally: 46/46 tests.

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 16, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 9b97bc3
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69e22df3ebc3d70008bf4fd8

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2026
@github-actions
Copy link
Copy Markdown

Build Impact Analysis

Selective Build Targets (building these covers all 21 affected)

cmake --build _build/release --target spark_expression_fuzzer_test velox_expression_runner_test velox_expression_runner_unit_test velox_format_datetime_benchmark velox_functions_spark_aggregates_test velox_functions_spark_test velox_spark_function_registry_test velox_spark_query_runner_test velox_sparksql_benchmarks_cast velox_sparksql_benchmarks_compare velox_sparksql_benchmarks_from_json velox_sparksql_benchmarks_get_funcs velox_sparksql_benchmarks_hash velox_sparksql_benchmarks_in velox_sparksql_benchmarks_simd_compare velox_sparksql_benchmarks_split velox_sparksql_coverage

Total affected: 21/565 targets

Affected targets (21)

Directly changed (2)

Target Changed Files
velox_functions_spark_specialforms SparkCastExpr.cpp, SparkCastHooks.cpp
velox_functions_spark_test SparkCastExprTest.cpp

Transitively affected (19)

  • spark_expression_fuzzer_test
  • velox_expression_runner
  • velox_expression_runner_test
  • velox_expression_runner_unit_test
  • velox_format_datetime_benchmark
  • velox_functions_spark
  • velox_functions_spark_aggregates_test
  • velox_functions_spark_impl
  • velox_spark_function_registry_test
  • velox_spark_query_runner_test
  • velox_sparksql_benchmarks_cast
  • velox_sparksql_benchmarks_compare
  • velox_sparksql_benchmarks_from_json
  • velox_sparksql_benchmarks_get_funcs
  • velox_sparksql_benchmarks_hash
  • velox_sparksql_benchmarks_in
  • velox_sparksql_benchmarks_simd_compare
  • velox_sparksql_benchmarks_split
  • velox_sparksql_coverage

Fast path • Graph from main@3a86cb261b0b0dfdfef527e679b93a7f1d87999d

Copy link
Copy Markdown
Contributor

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Some comments. Please check if they make sense. Thanks.

const long double micros =
static_cast<long double>(value) * Timestamp::kMicrosecondsInSecond;

if (micros > static_cast<long double>(std::numeric_limits<int64_t>::max()) ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest creating a static constexpr variable to hold static_cast(std::numeric_limits<int64_t>::max())

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The first two suggestions make sense. I’ll factor out the int64 bounds into local constexpr variables to simplify the overflow check.

static_cast<long double>(value) * Timestamp::kMicrosecondsInSecond;

if (micros > static_cast<long double>(std::numeric_limits<int64_t>::max()) ||
micros < static_cast<long double>(std::numeric_limits<int64_t>::min())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

if (micros > static_cast<long double>(std::numeric_limits<int64_t>::max()) ||
micros < static_cast<long double>(std::numeric_limits<int64_t>::min())) {
return folly::makeUnexpected(Status::UserError(
"Cannot cast floating-point value to TIMESTAMP because the result overflows."));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm Spark's error message? It would be better to make them consistent.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Spark’s ANSI float/double -> timestamp path uses doubleToTimestampAnsi(...), which throws invalidInputInCastToDatetimeError(...) for NaN/Infinity, and finite overflow goes through exact numeric conversion. So Spark’s behavior maps to the generic CAST_INVALID_INPUT and CAST_OVERFLOW error styles rather than a timestamp specific custom message.

In this Velox hook we only have the converted double value, not the full source expression/type context needed to reproduce Spark’s fully rendered message exactly. So, I’m keeping the logic as is and updating the wording to be closer to Spark’s generic error style.

@Harthi7 Harthi7 requested a review from philo-he April 17, 2026 12:56
Copy link
Copy Markdown
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

"The value cannot be cast to TIMESTAMP due to an overflow."));
}

return Timestamp::fromMicrosNoError(static_cast<int64_t>(micros));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since castNumberToTimestamp is also used by castIntToTimestamp, could we extend it with a field (e.g., allowOverflow) so that a single implementation supports both floating-point and integral types on ANSI ON & OFF?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants