feat: prep for 6.0.0, bunch of breaking changes#273
Conversation
There was a problem hiding this comment.
PR Summary
Major overhaul of the PostHog Python SDK for version 6.0.0, focusing on simplifying the API and improving maintainability through scope-based state management.
- Removal of
identify,page, andscreenfunctions in favor of more semantically accurate alternatives (setandcapture) - Changed all event-capturing functions to return event UUIDs instead of event contents
- Made
distinct_idoptional and moved to kwargs in event-capturing functions - Removed unmaintained exception-capture integrations in favor of general-purpose Django middleware
- Enforced breaking changes through kwargs-only parameters to ensure intentional upgrades
20 files reviewed, 6 comments
Edit PR Review Bot Settings | Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
bf80320 to
88b0a9a
Compare
|
Tagged recent reviewers on this code, as well as ET. |
There was a problem hiding this comment.
Might be worth adding a migration guide somewhere telling people the main things needed to migrate from v4 / v5 to v6
| This release contains a number of major breaking changes: | ||
| - feat: make distinct_id an optional parameter in posthog.capture and related functions | ||
| - feat: make capture and related functions return `Optional[str]`, which is the UUID of the sent event, if it was sent | ||
| - fix: remove `identify` (prefer `posthog.set()`), and `page` and `screen` (prefer `posthog.capture()`) |
There was a problem hiding this comment.
What's the thinking behind removing these methods?
- I get
identifyis pretty pointless because it doesn't actually persist across captures in the same way it does for the JS Web SDK. Can't you set the user id with contexts now? I'm guess that's the better way than setting the distcint_id on every capture - Worth keeping the
pagemethod so that people don't have to familiarize themselves with the$pageviewproperty? Screen I'm less concerned with because it doesn't really make sense on the web and is only used in the mobile SDKs afaik
There was a problem hiding this comment.
Yeah, identify is removed because it's semantically confusing - we should push people towards identifying at the context level always.
Re: page, my impression is pageview is generally captured by the frontend?
There was a problem hiding this comment.
Several people prefer to use a backend-only instrumentation, this being a framework integration for a non-SPA website (e.g: only django/flask views - not using our utilities) or doing it all themselves on their backend.
I think that using context + capture is enough for them; it is also a more focused API, so I personally prefer it.
| "ip": headers.get("X-Forwarded-For"), | ||
| "user_agent": headers.get("User-Agent"), | ||
| "traceparent": traceparent, | ||
| "$request_path": self.request.path, |
There was a problem hiding this comment.
Same with ip, user agent and request path. Should we be adding them to the Django integration or are they there already?
There was a problem hiding this comment.
request path is set as $current_url. Ip, user_agent and traceparent aren't set by the middleware currently. Re: should we be adding them, I guess probably? I can do so in a follow up PR.
lricoy
left a comment
There was a problem hiding this comment.
LGTM and I like the direction 😄
I tried the bits I am most aware of locally and they are working nicely.
Just left some comments for context.
hpouillot
left a comment
There was a problem hiding this comment.
I added few comments but it looks good to me. There is also a comment saying proxy methods have not been used in 5-6 months, and I think you capture signature will not be used with the recommended way of using posthog. Might be a good opportunity to clean them ?
| before_send=my_before_send, | ||
| sync_mode=True, | ||
| ) | ||
| msg_uuid = client.capture("user1", "test_event", {"original": "value"}) |
There was a problem hiding this comment.
shouldn't we change this capture call ?
There was a problem hiding this comment.
Change it to what? The client object itself already supported optional distinct IDs (it already being a keyword arg), so only the proxy functions needed to change.
There was a problem hiding this comment.
Sorry I got confused, the proxy method and the client method don't have the same signature. Proxy capture method accepts event as first argument while client method takes the distinct_id ?
There was a problem hiding this comment.
Switched to using Unpack[TypedDict] based kwargs everywhere it seemed to make sense
|
|
||
| self.capture_exception_fn(sys.exc_info(), extra_props) | ||
|
|
||
| signals.got_request_exception.connect(_got_request_exception) |
There was a problem hiding this comment.
Are we sure the exceptions this previously caught are handled by the try / catch in the new_context scope?
There was a problem hiding this comment.
The right way to intercept exceptions on requests is using the middleware, not django signals. If someone has the middleware in their middleware stack, and hasn't disabled exception capture in it, it'll catch their exceptions.
There was a problem hiding this comment.
Our regular autocapture still works the same way, to be clear
daibhin
left a comment
There was a problem hiding this comment.
Looks reasonable. I assume you've tested it and are happy. Just a changelog / section in the readme about the upgrade steps would be great
| """ | ||
| Add a tag to the current context. | ||
| Add a tag to the current context. All tags are added as properties to any event, including exceptions, captured | ||
| within the context. |
This change:
distinct_idan optional argument to most event-capturing functionsidentify,pageandscreen-identifyissetwith a misleading name,pageandscreenare justcaptureGenerally, the goal of this PR is to move the python SDK towards a place where we can use it as the "template" for all server-side SDKs - focusing on semantic scope based state management, simplified interfaces, and reducing how much we expose the internals of our libraries, reducing the cost of future refactoring and improvements to our users. It's the culmination of the previous PR's I've made re: nested contexts, context-based identity and session management, etc.