Skip to content

Prevent integer schema inference from over-widening past max(i64) #2890

@williamhbaker

Description

@williamhbaker

Context

Currently, schema inference for integer fields scales by powers of 10 to minimize schema churn. For example, an observed value of 3 sets the inferred maximum to 10, and an observed value of 12 sets the maximum to 100. This broader granularity is by design, as updating inferred schemas too frequently is undesirable.

The Problem

The tension with this approach is that materializations rely on these inferred values to determine what kind of column to create. If the power-of-10 rounding pushes an inferred maximum beyond the limits of an i64 (64-bit signed integer), the system will allocate a needlessly larger column type.

At these specific type boundaries, exactness is critical. We only want an inferred maximum to exceed max(i64) if the actual data strictly requires it. Other useful cutoffs to consider are the 128-bit and 256-bit points, though these are less commonly handled separately for materializations.

Proposed Solution

Maintain the power-of-10 granularity for the vast majority of cases to prevent schema churn, but implement specific bounds testing for critical numeric thresholds (e.g., i64 limits). This will prevent the inference engine from artificially over-widening column types during materialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions