feat(revenue-analytics): Resolve posthog_person_distinct_id from child Stripe objects by arthurdedeus · Pull Request #53595 · PostHog/posthog

arthurdedeus · 2026-04-07T17:24:13Z

Problem

The initial solution to ask our users to update their Stripe customers at every charge/subscription didn't sound great. Let's simplify this for them by coalescing the metadata field from customers and their child objects (subscriptions, charges and invoices) to build the metadata column in customer_revenue_view.

This way the set up will be to simply pass in metadata at Stripe customer, subscription/charge/invoice creation, without the need to update the customer (see the docs PR if this is still unclear).

Part of #52270

PS: I'm not super concerned about the performance impact this can have when managed-viewsets flag is off (i.e. we do all computation at query time), as the goal is to move everyone to the new architecture soon. If we start seeing complaints, we can start migrating people over.

Changes

Enrich the customer revenue view's metadata field with posthog_person_distinct_id resolved from child Stripe objects (subscription, charge, invoice) when the customer doesn't have it directly
Pick the freshest child by created timestamp; customer's own value always takes priority
Inject posthog_person_distinct_id_source (e.g. subscription::sub_123) for debuggability

How did you test this code?

New snapshot tests for all child schema combinations, integration test with Nullable columns via RevenueAnalyticsTopCustomersQueryRunner, and full revenue analytics + persons join suite passes.

Publish to changelog?

No

Docs update

Already written here

LLM context

Co-authored with Claude Code (Opus 4.6).

arthurdedeus · 2026-04-07T17:24:29Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-04-07T17:40:07Z

🎭 Playwright report · View test results →

⚠️ 1 flaky test:

Materialize view pane (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

tests-posthog · 2026-04-07T17:47:18Z

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

tests-posthog · 2026-04-07T19:49:01Z

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

greptile-apps · 2026-04-08T12:19:19Z

Vulnerabilities

No security concerns identified. The f-string SQL interpolation in _build_child_union_leg uses only the internal Stripe table name obtained from the system schema (not user input), and a nosemgrep annotation is correctly in place.

Comments Outside Diff (1)

products/revenue_analytics/backend/views/sources/test/stripe/test_stripe_customer_metadata_resolution.py, line 220-224 (link)

Integration test assertions are too weak to validate the feature

The test sets up three distinct customer scenarios (no metadata, direct metadata, no children) and several child objects specifically to verify metadata resolution. However, the only assertion is assertIsNotNone(response), which confirms the query runs without error but says nothing about whether the resolution logic is correct.

Given the complexity of the new resolution logic, this test should verify actual outcomes. For example:

def test_top_customers_query_works_with_nullable_metadata_columns(self):
    response = self._run_top_customers_query()
    self.assertIsNotNone(response)
    results = {row[0]: row for row in response.results}  # keyed by customer id

    # cus_no_meta should get alice_distinct resolved from the subscription
    alice_row = results["cus_no_meta"]
    self.assertIn("alice_distinct", str(alice_row))

    # cus_with_meta should keep bob_distinct (customer-level wins)
    bob_row = results["cus_with_meta"]
    self.assertIn("bob_distinct", str(bob_row))

Without asserting the actual resolved values, a regression in the _ENRICHED_METADATA_EXPR logic (e.g., the if branches being swapped) would go undetected.

Prompt To Fix With AI

This is a comment left during a code review.
Path: products/revenue_analytics/backend/views/sources/test/stripe/test_stripe_customer_metadata_resolution.py
Line: 220-224

Comment:
**Integration test assertions are too weak to validate the feature**

The test sets up three distinct customer scenarios (no metadata, direct metadata, no children) and several child objects specifically to verify metadata resolution. However, the only assertion is `assertIsNotNone(response)`, which confirms the query runs without error but says nothing about whether the resolution logic is correct.

Given the complexity of the new resolution logic, this test should verify actual outcomes. For example:

```python
def test_top_customers_query_works_with_nullable_metadata_columns(self):
    response = self._run_top_customers_query()
    self.assertIsNotNone(response)
    results = {row[0]: row for row in response.results}  # keyed by customer id

    # cus_no_meta should get alice_distinct resolved from the subscription
    alice_row = results["cus_no_meta"]
    self.assertIn("alice_distinct", str(alice_row))

    # cus_with_meta should keep bob_distinct (customer-level wins)
    bob_row = results["cus_with_meta"]
    self.assertIn("bob_distinct", str(bob_row))
```

Without asserting the actual resolved values, a regression in the `_ENRICHED_METADATA_EXPR` logic (e.g., the `if` branches being swapped) would go undetected.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: products/revenue_analytics/backend/views/sources/test/stripe/test_stripe_customer_metadata_resolution.py
Line: 220-224

Comment:
**Integration test assertions are too weak to validate the feature**

The test sets up three distinct customer scenarios (no metadata, direct metadata, no children) and several child objects specifically to verify metadata resolution. However, the only assertion is `assertIsNotNone(response)`, which confirms the query runs without error but says nothing about whether the resolution logic is correct.

Given the complexity of the new resolution logic, this test should verify actual outcomes. For example:

```python
def test_top_customers_query_works_with_nullable_metadata_columns(self):
    response = self._run_top_customers_query()
    self.assertIsNotNone(response)
    results = {row[0]: row for row in response.results}  # keyed by customer id

    # cus_no_meta should get alice_distinct resolved from the subscription
    alice_row = results["cus_no_meta"]
    self.assertIn("alice_distinct", str(alice_row))

    # cus_with_meta should keep bob_distinct (customer-level wins)
    bob_row = results["cus_with_meta"]
    self.assertIn("bob_distinct", str(bob_row))
```

Without asserting the actual resolved values, a regression in the `_ENRICHED_METADATA_EXPR` logic (e.g., the `if` branches being swapped) would go undetected.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/revenue_analytics/backend/views/sources/stripe/customer.py
Line: 280-291

Comment:
**Two independent `argMax` calls may return values from different rows on `created_at` ties**

`argMax(distinct_id, created_at)` and `argMax(source_ref, created_at)` are evaluated independently. ClickHouse makes no guarantee that, when two rows share the same maximum `created_at`, both calls will pick values from the *same* row. In that (admittedly rare) edge case, `resolved_distinct_id` could come from a subscription and `resolved_source` might point to a charge — giving a misleading debug annotation.

A simple way to make them consistent is to use a single `argMax` on a tuple:

```python
ast.Alias(
    alias="resolved_pair",
    expr=ast.Call(
        name="argMax",
        args=[
            ast.Tuple(exprs=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["source_ref"])]),
            ast.Field(chain=["created_at"]),
        ],
    ),
),
```

then unpack `resolved_pair.1` / `resolved_pair.2` in the outer select. Alternatively, since `source_ref` is already derived from the same row as `distinct_id` (same `id`), a composite string like `concat(distinct_id, '|', source_ref)` and a single `argMax` would also work.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(revenue-analytics): Fix semgrep and ..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T12:19:26Z

products/revenue_analytics/backend/views/sources/stripe/customer.py

+            ast.Field(chain=["customer_id"]),
+            ast.Alias(
+                alias="resolved_distinct_id",
+                expr=ast.Call(name="argMax", args=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["created_at"])]),
+            ),
+            ast.Alias(
+                alias="resolved_source",
+                expr=ast.Call(name="argMax", args=[ast.Field(chain=["source_ref"]), ast.Field(chain=["created_at"])]),
+            ),
+        ],
+        select_from=ast.JoinExpr(table=union_query, alias="child_meta"),
+        group_by=[ast.Field(chain=["customer_id"])],


Two independent argMax calls may return values from different rows on created_at ties

argMax(distinct_id, created_at) and argMax(source_ref, created_at) are evaluated independently. ClickHouse makes no guarantee that, when two rows share the same maximum created_at, both calls will pick values from the same row. In that (admittedly rare) edge case, resolved_distinct_id could come from a subscription and resolved_source might point to a charge — giving a misleading debug annotation.

A simple way to make them consistent is to use a single argMax on a tuple:

ast.Alias( alias="resolved_pair", expr=ast.Call( name="argMax", args=[ ast.Tuple(exprs=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["source_ref"])]), ast.Field(chain=["created_at"]), ], ), ),

then unpack resolved_pair.1 / resolved_pair.2 in the outer select. Alternatively, since source_ref is already derived from the same row as distinct_id (same id), a composite string like concat(distinct_id, '|', source_ref) and a single argMax would also work.

Prompt To Fix With AI

This is a comment left during a code review. Path: products/revenue_analytics/backend/views/sources/stripe/customer.py Line: 280-291 Comment: **Two independent `argMax` calls may return values from different rows on `created_at` ties** `argMax(distinct_id, created_at)` and `argMax(source_ref, created_at)` are evaluated independently. ClickHouse makes no guarantee that, when two rows share the same maximum `created_at`, both calls will pick values from the *same* row. In that (admittedly rare) edge case, `resolved_distinct_id` could come from a subscription and `resolved_source` might point to a charge — giving a misleading debug annotation. A simple way to make them consistent is to use a single `argMax` on a tuple: ```python ast.Alias( alias="resolved_pair", expr=ast.Call( name="argMax", args=[ ast.Tuple(exprs=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["source_ref"])]), ast.Field(chain=["created_at"]), ], ), ), ``` then unpack `resolved_pair.1` / `resolved_pair.2` in the outer select. Alternatively, since `source_ref` is already derived from the same row as `distinct_id` (same `id`), a composite string like `concat(distinct_id, '|', source_ref)` and a single `argMax` would also work. How can I resolve this? If you propose a fix, please make it concise.

…d Stripe objects

Generated-By: PostHog Code Task-Id: 5dc63d0f-aad3-46eb-9fdc-a84cd49dd8a1

tests-posthog · 2026-04-08T21:44:41Z

⏭️ Skipped snapshot commit because branch advanced to e74ed68 while workflow was testing 16f548f.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

Merge master into your branch, or
Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

tests-posthog · 2026-04-08T22:05:04Z

Query snapshots: Backend query snapshots updated

Changes: 2 snapshots (2 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

…d Stripe objects (#53595) Co-authored-by: tests-posthog[bot] <250237707+tests-posthog[bot]@users.noreply.github.com>

arthurdedeus mentioned this pull request Apr 7, 2026

fix(revenue-analytics): Persons materialized view joins #52929

Merged

This was referenced Apr 7, 2026

fix(revenue-analytics): Groups materialized view joins #53119

Merged

fix(revenue-analytics): Use correct field name in persons join #53594

Merged

feat(stripe-mock): Add subscription_metadata and charge_metadata config #53596

Merged

arthurdedeus self-assigned this Apr 7, 2026

arthurdedeus force-pushed the fix/revenue-analytics-person-join branch from fc3a1c3 to ee07485 Compare April 7, 2026 17:58

arthurdedeus force-pushed the feat/coalesce-distinct-id-from-child-objects branch from fbf0fad to a71e13f Compare April 7, 2026 18:01

arthurdedeus force-pushed the fix/revenue-analytics-person-join branch from ee07485 to c457be8 Compare April 7, 2026 18:07

arthurdedeus force-pushed the feat/coalesce-distinct-id-from-child-objects branch from a71e13f to 7b2b74d Compare April 7, 2026 18:08

arthurdedeus marked this pull request as ready for review April 8, 2026 12:15

arthurdedeus requested a review from a team as a code owner April 8, 2026 12:15

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

arthurdedeus requested a review from rafaeelaudibert April 8, 2026 13:17

arthurdedeus mentioned this pull request Apr 8, 2026

docs(revenue-analytics): Connect to customers PostHog/posthog.com#16167

Merged

5 tasks

rafaeelaudibert approved these changes Apr 8, 2026

View reviewed changes

georgemunyoro approved these changes Apr 8, 2026

View reviewed changes

Base automatically changed from fix/revenue-analytics-person-join to master April 8, 2026 20:31

arthurdedeus and others added 6 commits April 8, 2026 18:17

feat(revenue-analytics): Resolve posthog_person_distinct_id from chil…

f6664a9

…d Stripe objects

refactor: Join the child queries using UNION ALL

401e84c

test(backend): update query snapshots

14e60aa

fix(revenue-analytics): Fix semgrep and mypy errors in customer view

16f7443

Generated-By: PostHog Code Task-Id: 5dc63d0f-aad3-46eb-9fdc-a84cd49dd8a1

chore: Move persons join test

206876f

test: Refactor RevenueAnalyticsTestBase

2bd5cc2

arthurdedeus force-pushed the feat/coalesce-distinct-id-from-child-objects branch from d147dda to 16f548f Compare April 8, 2026 21:20

test: Add TestCustomerStripeMetadataResolution

e74ed68

arthurdedeus force-pushed the feat/coalesce-distinct-id-from-child-objects branch 2 times, most recently from 23ef8aa to e74ed68 Compare April 8, 2026 21:30

test(backend): update query snapshots

c6c14fe

arthurdedeus merged commit 3630c3a into master Apr 8, 2026
198 checks passed

arthurdedeus deleted the feat/coalesce-distinct-id-from-child-objects branch April 8, 2026 22:29

rnegron pushed a commit that referenced this pull request Apr 9, 2026

feat(revenue-analytics): Resolve posthog_person_distinct_id from chil…

88d312f

…d Stripe objects (#53595) Co-authored-by: tests-posthog[bot] <250237707+tests-posthog[bot]@users.noreply.github.com>

arthurdedeus mentioned this pull request Apr 10, 2026

3.5 Connecting Stripe customers to PostHog persons/groups is confusing #52270

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(revenue-analytics): Resolve posthog_person_distinct_id from child Stripe objects#53595

feat(revenue-analytics): Resolve posthog_person_distinct_id from child Stripe objects#53595
arthurdedeus merged 8 commits intomasterfrom
feat/coalesce-distinct-id-from-child-objects

arthurdedeus commented Apr 7, 2026 •

edited

Loading

Uh oh!

arthurdedeus commented Apr 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

tests-posthog bot commented Apr 7, 2026

Uh oh!

tests-posthog bot commented Apr 7, 2026

Uh oh!

greptile-apps bot commented Apr 8, 2026 •

edited

Loading

Vulnerabilities

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Apr 8, 2026

Uh oh!

tests-posthog bot commented Apr 8, 2026

Uh oh!

tests-posthog bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arthurdedeus commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

Docs update

LLM context

Uh oh!

arthurdedeus commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tests-posthog bot commented Apr 7, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

tests-posthog bot commented Apr 7, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

greptile-apps bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vulnerabilities

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tests-posthog bot commented Apr 8, 2026

Uh oh!

tests-posthog bot commented Apr 8, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arthurdedeus commented Apr 7, 2026 •

edited

Loading

arthurdedeus commented Apr 7, 2026 •

edited

Loading

github-actions bot commented Apr 7, 2026 •

edited

Loading

greptile-apps bot commented Apr 8, 2026 •

edited

Loading