feat(revenue-analytics): Resolve posthog_person_distinct_id from child Stripe objects#53595
Conversation
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
fc3a1c3 to
ee07485
Compare
fbf0fad to
a71e13f
Compare
ee07485 to
c457be8
Compare
a71e13f to
7b2b74d
Compare
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
|
| ast.Field(chain=["customer_id"]), | ||
| ast.Alias( | ||
| alias="resolved_distinct_id", | ||
| expr=ast.Call(name="argMax", args=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["created_at"])]), | ||
| ), | ||
| ast.Alias( | ||
| alias="resolved_source", | ||
| expr=ast.Call(name="argMax", args=[ast.Field(chain=["source_ref"]), ast.Field(chain=["created_at"])]), | ||
| ), | ||
| ], | ||
| select_from=ast.JoinExpr(table=union_query, alias="child_meta"), | ||
| group_by=[ast.Field(chain=["customer_id"])], |
There was a problem hiding this comment.
Two independent
argMax calls may return values from different rows on created_at ties
argMax(distinct_id, created_at) and argMax(source_ref, created_at) are evaluated independently. ClickHouse makes no guarantee that, when two rows share the same maximum created_at, both calls will pick values from the same row. In that (admittedly rare) edge case, resolved_distinct_id could come from a subscription and resolved_source might point to a charge — giving a misleading debug annotation.
A simple way to make them consistent is to use a single argMax on a tuple:
ast.Alias(
alias="resolved_pair",
expr=ast.Call(
name="argMax",
args=[
ast.Tuple(exprs=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["source_ref"])]),
ast.Field(chain=["created_at"]),
],
),
),then unpack resolved_pair.1 / resolved_pair.2 in the outer select. Alternatively, since source_ref is already derived from the same row as distinct_id (same id), a composite string like concat(distinct_id, '|', source_ref) and a single argMax would also work.
Prompt To Fix With AI
This is a comment left during a code review.
Path: products/revenue_analytics/backend/views/sources/stripe/customer.py
Line: 280-291
Comment:
**Two independent `argMax` calls may return values from different rows on `created_at` ties**
`argMax(distinct_id, created_at)` and `argMax(source_ref, created_at)` are evaluated independently. ClickHouse makes no guarantee that, when two rows share the same maximum `created_at`, both calls will pick values from the *same* row. In that (admittedly rare) edge case, `resolved_distinct_id` could come from a subscription and `resolved_source` might point to a charge — giving a misleading debug annotation.
A simple way to make them consistent is to use a single `argMax` on a tuple:
```python
ast.Alias(
alias="resolved_pair",
expr=ast.Call(
name="argMax",
args=[
ast.Tuple(exprs=[ast.Field(chain=["distinct_id"]), ast.Field(chain=["source_ref"])]),
ast.Field(chain=["created_at"]),
],
),
),
```
then unpack `resolved_pair.1` / `resolved_pair.2` in the outer select. Alternatively, since `source_ref` is already derived from the same row as `distinct_id` (same `id`), a composite string like `concat(distinct_id, '|', source_ref)` and a single `argMax` would also work.
How can I resolve this? If you propose a fix, please make it concise.Generated-By: PostHog Code Task-Id: 5dc63d0f-aad3-46eb-9fdc-a84cd49dd8a1
d147dda to
16f548f
Compare
23ef8aa to
e74ed68
Compare
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
Query snapshots: Backend query snapshots updatedChanges: 2 snapshots (2 modified, 0 added, 0 deleted) What this means:
Next steps:
|
…d Stripe objects (#53595) Co-authored-by: tests-posthog[bot] <250237707+tests-posthog[bot]@users.noreply.github.com>

Problem
The initial solution to ask our users to update their Stripe customers at every charge/subscription didn't sound great. Let's simplify this for them by coalescing the
metadatafield from customers and their child objects (subscriptions, charges and invoices) to build themetadatacolumn incustomer_revenue_view.This way the set up will be to simply pass in metadata at Stripe customer, subscription/charge/invoice creation, without the need to update the customer (see the docs PR if this is still unclear).
Part of #52270
PS: I'm not super concerned about the performance impact this can have when
managed-viewsetsflag is off (i.e. we do all computation at query time), as the goal is to move everyone to the new architecture soon. If we start seeing complaints, we can start migrating people over.Changes
metadatafield withposthog_person_distinct_idresolved from child Stripe objects (subscription, charge, invoice) when the customer doesn't have it directlycreatedtimestamp; customer's own value always takes priorityposthog_person_distinct_id_source(e.g.subscription::sub_123) for debuggabilityHow did you test this code?
New snapshot tests for all child schema combinations, integration test with Nullable columns via
RevenueAnalyticsTopCustomersQueryRunner, and full revenue analytics + persons join suite passes.Publish to changelog?
No
Docs update
Already written here
LLM context
Co-authored with Claude Code (Opus 4.6).