-
Notifications
You must be signed in to change notification settings - Fork 66
feat(flags): add a flag_fallback_cache that tracks feature flag evaluation results and uses them as fallback values whenever the /flags API isn't available
#275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
7147329
880ea1d
fded73f
2aff4ca
7539746
f443634
cc0ec5c
b80802c
58a4f6e
a5945a6
d6c6fa8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,6 +50,8 @@ | |
| to_values, | ||
| ) | ||
| from posthog.utils import ( | ||
| FlagCache, | ||
| RedisFlagCache, | ||
| SizeLimitedDict, | ||
| clean, | ||
| guess_timezone, | ||
|
|
@@ -95,7 +97,33 @@ def add_context_tags(properties): | |
|
|
||
|
|
||
| class Client(object): | ||
| """Create a new PostHog client.""" | ||
| """Create a new PostHog client. | ||
|
|
||
| Examples: | ||
| Basic usage: | ||
| >>> client = Client("your-api-key") | ||
|
|
||
| With feature flag fallback cache (memory): | ||
| >>> client = Client("your-api-key", enable_flag_fallback_cache=True) | ||
|
|
||
| With Redis fallback cache for high-scale applications: | ||
| >>> client = Client( | ||
| ... "your-api-key", | ||
| ... enable_flag_fallback_cache=True, | ||
| ... flag_fallback_cache_backend="redis", | ||
| ... flag_fallback_cache_redis_url="redis://localhost:6379/0" | ||
| ... ) | ||
|
|
||
| With existing Redis client: | ||
| >>> import redis | ||
| >>> redis_client = redis.Redis(host='localhost', port=6379, db=0) | ||
| >>> client = Client( | ||
| ... "your-api-key", | ||
| ... enable_flag_fallback_cache=True, | ||
| ... flag_fallback_cache_backend="redis", | ||
| ... flag_fallback_cache_redis_client=redis_client | ||
| ... ) | ||
| """ | ||
|
Comment on lines
+100
to
+123
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this overkill to include in the docstring? makes it seem like this is the only thing worth passing to the Client, but I didn't know where else to put the docs.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's fine, for now at least. Documenting the behaviour of these args on posthog.com seems like the generally accepted approach |
||
|
|
||
| log = logging.getLogger("posthog") | ||
|
|
||
|
|
@@ -126,6 +154,12 @@ def __init__( | |
| project_root=None, | ||
| privacy_mode=False, | ||
| before_send=None, | ||
| enable_flag_fallback_cache=False, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This param seems overkill - if the URL isn't |
||
| flag_fallback_cache_size=10000, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't tell what the unit is here, and don't think a user will be able to easily either? See note below, seems backend-specific so I'd put it in a url param.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good callout (the unit is |
||
| flag_fallback_cache_ttl=300, | ||
| flag_fallback_cache_backend="memory", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above - you'll need to accept URLs for this anyway, and URLs are fundamentally a serialisation format to describe an access protocol and a location. I'd put these as query parameters into the URL, where they're relevant, like:
This has the advantage of being universal across languages, so we can expose it in a consistent manner (avoiding interface skew a bit), and letting you take vastly different arguments depending on the backend without an endlessly growing list of parameters or wrapper/enum types exposed |
||
| flag_fallback_cache_redis_url=None, | ||
| flag_fallback_cache_redis_client=None, | ||
| ): | ||
| self.queue = queue.Queue(max_queue_size) | ||
|
|
||
|
|
@@ -151,6 +185,15 @@ def __init__( | |
| ) | ||
| self.poller = None | ||
| self.distinct_ids_feature_flags_reported = SizeLimitedDict(MAX_DICT_SIZE, set) | ||
| self.flag_cache = self._initialize_flag_cache( | ||
| enable_flag_fallback_cache, | ||
| flag_fallback_cache_backend, | ||
| flag_fallback_cache_size, | ||
| flag_fallback_cache_ttl, | ||
| flag_fallback_cache_redis_url, | ||
| flag_fallback_cache_redis_client, | ||
| ) | ||
| self.flag_definition_version = 0 | ||
| self.disabled = disabled | ||
| self.disable_geoip = disable_geoip | ||
| self.historical_migration = historical_migration | ||
|
|
@@ -707,6 +750,9 @@ def shutdown(self): | |
|
|
||
| def _load_feature_flags(self): | ||
| try: | ||
| # Store old flags to detect changes | ||
| old_flags_by_key: dict[str, dict] = self.feature_flags_by_key or {} | ||
|
|
||
| response = get( | ||
| self.personal_api_key, | ||
| f"/api/feature_flag/local_evaluation/?token={self.api_key}&send_cohorts", | ||
|
|
@@ -718,6 +764,14 @@ def _load_feature_flags(self): | |
| self.group_type_mapping = response["group_type_mapping"] or {} | ||
| self.cohorts = response["cohorts"] or {} | ||
|
|
||
| # Check if flag definitions changed and update version | ||
| if self.flag_cache and old_flags_by_key != ( | ||
| self.feature_flags_by_key or {} | ||
| ): | ||
| old_version = self.flag_definition_version | ||
| self.flag_definition_version += 1 | ||
| self.flag_cache.invalidate_version(old_version) | ||
|
dmarticus marked this conversation as resolved.
|
||
|
|
||
| except APIError as e: | ||
| if e.status == 401: | ||
| self.log.error( | ||
|
|
@@ -739,6 +793,10 @@ def _load_feature_flags(self): | |
| self.group_type_mapping = {} | ||
| self.cohorts = {} | ||
|
|
||
| # Clear flag cache when quota limited | ||
| if self.flag_cache: | ||
| self.flag_cache.clear() | ||
|
|
||
| if self.debug: | ||
| raise APIError( | ||
| status=402, | ||
|
|
@@ -889,6 +947,12 @@ def _get_feature_flag_result( | |
| flag_result = FeatureFlagResult.from_value_and_payload( | ||
| key, lookup_match_value, payload | ||
| ) | ||
|
|
||
| # Cache successful local evaluation | ||
| if self.flag_cache and flag_result: | ||
| self.flag_cache.set_cached_flag( | ||
| distinct_id, key, flag_result, self.flag_definition_version | ||
| ) | ||
| elif not only_evaluate_locally: | ||
| try: | ||
| flag_details, request_id = self._get_feature_flag_details_from_decide( | ||
|
|
@@ -902,12 +966,30 @@ def _get_feature_flag_result( | |
| flag_result = FeatureFlagResult.from_flag_details( | ||
| flag_details, override_match_value | ||
| ) | ||
|
|
||
| # Cache successful remote evaluation | ||
| if self.flag_cache and flag_result: | ||
| self.flag_cache.set_cached_flag( | ||
| distinct_id, key, flag_result, self.flag_definition_version | ||
| ) | ||
|
|
||
| self.log.debug( | ||
| f"Successfully computed flag remotely: #{key} -> #{flag_result}" | ||
| ) | ||
| except Exception as e: | ||
| self.log.exception(f"[FEATURE FLAGS] Unable to get flag remotely: {e}") | ||
|
|
||
| # Fallback to cached value if remote evaluation fails | ||
| if self.flag_cache: | ||
| stale_result = self.flag_cache.get_stale_cached_flag( | ||
| distinct_id, key | ||
| ) | ||
| if stale_result: | ||
| self.log.info( | ||
| f"[FEATURE FLAGS] Using stale cached value for flag {key}" | ||
| ) | ||
| flag_result = stale_result | ||
|
|
||
| if send_feature_flag_events: | ||
| self._capture_feature_flag_called( | ||
| distinct_id, | ||
|
|
@@ -1278,6 +1360,75 @@ def _get_all_flags_and_payloads_locally( | |
| "featureFlagPayloads": payloads, | ||
| }, fallback_to_decide | ||
|
|
||
| def _initialize_flag_cache( | ||
| self, | ||
| enable_flag_fallback_cache, | ||
| backend, | ||
| cache_size, | ||
| cache_ttl, | ||
| redis_url, | ||
| redis_client, | ||
| ): | ||
| """Initialize feature flag cache for graceful degradation during service outages. | ||
|
|
||
| When enabled, the cache stores flag evaluation results and serves them as fallback | ||
| when the PostHog API is unavailable. This ensures your application continues to | ||
| receive flag values even during outages. | ||
|
|
||
| Example Redis usage: | ||
| client = Client( | ||
| "your-api-key", | ||
| enable_flag_fallback_cache=True, | ||
| flag_fallback_cache_backend="redis", | ||
| flag_fallback_cache_redis_url="redis://localhost:6379/0" | ||
| ) | ||
|
|
||
| # Normal evaluation - cache is populated | ||
| flag_value = client.get_feature_flag("my-flag", "user123") | ||
|
|
||
| # During API outage - returns cached value instead of None | ||
| flag_value = client.get_feature_flag("my-flag", "user123") # Uses cache | ||
| """ | ||
| if not enable_flag_fallback_cache: | ||
| return None | ||
|
|
||
| if backend == "redis": | ||
| try: | ||
| # Try to import redis | ||
| import redis | ||
|
|
||
| # Use provided client or create from URL | ||
| if redis_client: | ||
| client = redis_client | ||
| elif redis_url: | ||
| client = redis.from_url(redis_url) | ||
| else: | ||
| raise ValueError( | ||
| "Redis backend requires either flag_cache_redis_url or flag_cache_redis_client" | ||
| ) | ||
|
|
||
| # Test connection | ||
| client.ping() | ||
|
|
||
| return RedisFlagCache(client, default_ttl=cache_ttl) | ||
|
|
||
| except ImportError: | ||
| self.log.warning( | ||
| "[FEATURE FLAGS] Redis not available, falling back to memory cache" | ||
| ) | ||
| return FlagCache(cache_size, cache_ttl) | ||
| except Exception as e: | ||
| self.log.warning( | ||
| f"[FEATURE FLAGS] Redis connection failed: {e}, falling back to memory cache" | ||
| ) | ||
| return FlagCache(cache_size, cache_ttl) | ||
|
|
||
| elif backend == "memory": | ||
| return FlagCache(cache_size, cache_ttl) | ||
|
|
||
| else: | ||
| raise ValueError(f"Unknown flag cache backend: {backend}") | ||
|
|
||
| def feature_flag_definitions(self): | ||
| return self.feature_flags | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose a slightly different API, similar to how we do configs on
posthog-jsThis avoids us flooding the main
argslist and offloads all of the config-handling to a separate class. That class can also handle instantiating the cache, for example. I believe the code would look much more modular that way.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why put the service name in the parameters at all, it's right there in the URL schema (I would just pass
fallback_cache_url='redis://...', and then list the schema's we support as we extend support to further backend systems)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because someone might pass a custom client we dont recognize (custom redis wrapper) and we need to know how to use that. Im in favor of specifying the service. OR we make it optional, and fail in case we cant use it (or just dont use it, log a warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to specifically prevent people passing clients here, though - for reasons outlined here - letting them pass a custom client is the same as defining an interface, except now we'd need to define one for every cache service rather than an abstraction.
I also can't find places we do this custom wrapper stuff in posthog-js?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @oliverb123 for this first implementation. The less surface area we commit to, the better. Down the road, when our caching approach is mature, we could always add an interface for custom cache providers.