Skip to content

Fix: prevent division-by-zero in not_null_proportion and align empty-table behavior#1083

Open
shivasingh4945-tech wants to merge 5 commits into
dbt-labs:mainfrom
shivasingh4945-tech:fix-not-null-proportion
Open

Fix: prevent division-by-zero in not_null_proportion and align empty-table behavior#1083
shivasingh4945-tech wants to merge 5 commits into
dbt-labs:mainfrom
shivasingh4945-tech:fix-not-null-proportion

Conversation

@shivasingh4945-tech
Copy link
Copy Markdown

What

This PR fixes a division-by-zero issue in the not_null_proportion generic test.

Problem

The current implementation computes:

SUM(CASE WHEN column IS NULL THEN 0 ELSE 1 END) / COUNT(*)

This can cause division-by-zero errors when the denominator evaluates to 0. This situation can occur in real-world scenarios such as:

  • filtered datasets
  • joins producing empty partitions
  • grouped queries with no matching rows

Depending on query shape, this can either return NULL or raise a runtime error, leading to inconsistent and non-portable behavior across warehouses.

Solution

Replaced the aggregation logic with:

(COUNT(column_name) * 1.0) / NULLIF(COUNT(*), 0)

This ensures:

  • division-by-zero is prevented using NULLIF
  • consistent behavior across warehouses
  • empty datasets result in NULL, which aligns with dbt test semantics

This approach is widely used in SQL to safely handle division in aggregate queries.

Why no COALESCE?

Empty datasets now return NULL. In dbt, tests fail only when rows are returned. Since comparisons with NULL do not evaluate to TRUE, empty datasets do not produce failing rows and therefore pass, consistent with existing dbt-utils conventions.

Reproducible Example

-- ❌ Current behavior (can error depending on query shape)
SELECT
SUM(CASE WHEN value IS NULL THEN 0 ELSE 1 END)
/ COUNT(*) FILTER (WHERE false) AS not_null_proportion
FROM (VALUES (1), (2)) t(value);

-- Result:
-- ERROR: division by zero

-- ✅ Fixed behavior (safe)
SELECT
(COUNT(value) * 1.0)
/ NULLIF(COUNT(*) FILTER (WHERE false), 0) AS not_null_proportion
FROM (VALUES (1), (2)) t(value);

-- Result:
-- NULL (no error)

Additional Improvements

  • Replaced SUM(CASE...) with COUNT(column_name) for clarity and performance
  • Added validation for required column_name
  • Added validation for at_least <= at_most
  • Ensured group-by variables are always defined

Behavior

Case Previous Behavior New Behavior
Empty dataset NULL or runtime error NULL (test passes)
Division by zero Runtime error Safe (no error)
Mixed data Correct Correct

Testing

Validated with:

  • empty datasets
  • all NULL values
  • mixed NULL / non-NULL values
  • grouped queries

Notes

This change focuses on correctness and stability without altering expected behavior for valid datasets. No breaking changes are expected for valid inputs.

@shivasingh4945-tech shivasingh4945-tech requested a review from a team as a code owner April 26, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant