[Fix] Isolate RNG seed generator changes and clarify documentation#22709
[Fix] Isolate RNG seed generator changes and clarify documentation#22709ChiragSW wants to merge 5 commits intokeras-team:masterfrom
Conversation
Isolate random seed generator and rng utility updates into a separate branch to clarify scope and rationale.
Document why default initializer seeds use a deterministic counter after set_random_seed() to address reviewer questions about context and intent.
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to ensure that default initializer seeds are deterministic and replayable after calling keras.utils.set_random_seed(). It achieves this by maintaining a global counter for initializer seeds within rng_utils.py. The review feedback suggests simplifying the logic in make_default_seed by delegating more responsibility to the utility function and refactoring consume_default_initializer_seed to remove redundant checks and handle unseeded cases more consistently.
| if rng_utils.get_random_seed() is not None: | ||
| # Keep implicit initializer seeds deterministic after | ||
| # `keras.utils.set_random_seed()` calls. | ||
| return rng_utils.consume_default_initializer_seed() | ||
| return int(np.random.randint(1, int(1e9))) |
There was a problem hiding this comment.
The logic here can be simplified by delegating the decision to rng_utils.consume_default_initializer_seed(). This avoids duplicating the fallback logic and makes the code more maintainable while preserving the intended behavior for both seeded and unseeded runs.
# Keep implicit initializer seeds deterministic after
# `keras.utils.set_random_seed()` calls.
return rng_utils.consume_default_initializer_seed()| seed = global_state.get_global_attribute(GLOBAL_DEFAULT_INITIALIZER_SEED) | ||
| if seed is None: | ||
| seed = get_random_seed() | ||
| if seed is None: | ||
| seed = int(np.random.randint(1, int(1e9))) | ||
| global_state.set_global_attribute( | ||
| GLOBAL_DEFAULT_INITIALIZER_SEED, int(seed) + 1 | ||
| ) | ||
| return int(seed) |
There was a problem hiding this comment.
The current implementation contains redundant checks and dead code (lines 96-97) given the current caller's guard in seed_generator.py. By unifying the logic here, we can support both seeded and unseeded cases consistently. Returning early for the unseeded case (when seed is None) preserves the 'draw every time' behavior for unseeded runs, as the global counter won't be initialized, while still providing a robust internal utility.
| seed = global_state.get_global_attribute(GLOBAL_DEFAULT_INITIALIZER_SEED) | |
| if seed is None: | |
| seed = get_random_seed() | |
| if seed is None: | |
| seed = int(np.random.randint(1, int(1e9))) | |
| global_state.set_global_attribute( | |
| GLOBAL_DEFAULT_INITIALIZER_SEED, int(seed) + 1 | |
| ) | |
| return int(seed) | |
| seed = global_state.get_global_attribute(GLOBAL_DEFAULT_INITIALIZER_SEED) | |
| if seed is None: | |
| seed = get_random_seed() | |
| if seed is not None: | |
| global_state.set_global_attribute( | |
| GLOBAL_DEFAULT_INITIALIZER_SEED, seed + 1 | |
| ) | |
| return seed | |
| return int(np.random.randint(1, 1_000_000_001)) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #22709 +/- ##
==========================================
+ Coverage 75.25% 83.03% +7.78%
==========================================
Files 596 596
Lines 68741 69911 +1170
Branches 10749 10900 +151
==========================================
+ Hits 51729 58052 +6323
+ Misses 14473 9007 -5466
- Partials 2539 2852 +313
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to ensure that default initializer seeds are deterministic and replayable following a call to keras.utils.set_random_seed(). It adds a global state tracker for these seeds and a consumption function that increments the seed value to maintain uniqueness across calls. Feedback was provided regarding the fallback logic in consume_default_initializer_seed, specifically that using np.random.randint when no Keras seed is set could introduce unwanted side effects on the global NumPy RNG state, potentially impacting users who manage their own NumPy seeds.
| GLOBAL_DEFAULT_INITIALIZER_SEED, seed + 1 | ||
| ) | ||
| return seed | ||
| return int(np.random.randint(1, 1_000_000_001)) |
There was a problem hiding this comment.
The switch from python_random.randint to np.random.randint for the fallback case introduces a side effect on the global NumPy RNG state when no Keras seed is set. If a user seeds NumPy manually but not Keras, layer initialization will now consume values from their NumPy RNG, potentially breaking their own reproducibility. It is safer to use a dedicated np.random.Generator or stick to python_random to avoid interfering with the user's numerical RNG state.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to ensure that default initializer seeds are deterministic and replayable after calling keras.utils.set_random_seed(). It achieves this by implementing consume_default_initializer_seed in rng_utils.py, which tracks a separate global state for these seeds, preventing them from being affected by other random draws such as those from NumPy. Feedback suggests using 2**31 - 1 as the upper bound for random seeds to maximize entropy and ensure 32-bit compatibility. Additionally, it is recommended to initialize the global seed state even in unseeded workflows to maintain consistency across subsequent calls.
| GLOBAL_DEFAULT_INITIALIZER_SEED, seed + 1 | ||
| ) | ||
| return seed | ||
| return random.randint(1, int(1e9)) |
There was a problem hiding this comment.
While int(1e9) is functional, using a plain integer literal like 1_000_000_000 is more idiomatic in Python and avoids the unnecessary float conversion. Additionally, since Keras often deals with 32-bit integer seeds in backends, using 2**31 - 1 (the maximum value for a signed 32-bit integer) as the upper bound would maximize the entropy of the generated seeds while remaining safe for all backends.
| return random.randint(1, int(1e9)) | |
| return random.randint(1, 2**31 - 1) |
| seed = global_state.get_global_attribute(GLOBAL_DEFAULT_INITIALIZER_SEED) | ||
| if seed is None: | ||
| seed = get_random_seed() | ||
| if seed is not None: | ||
| global_state.set_global_attribute( | ||
| GLOBAL_DEFAULT_INITIALIZER_SEED, seed + 1 | ||
| ) | ||
| return seed | ||
| return random.randint(1, int(1e9)) |
There was a problem hiding this comment.
The current implementation of consume_default_initializer_seed does not update the GLOBAL_DEFAULT_INITIALIZER_SEED attribute when it falls back to random.randint(1, int(1e9)). This means that in an unseeded environment, every call to this function will perform a new draw from the global Python random state.
To improve isolation and consistency even in unseeded workflows, consider initializing the global attribute with the first draw. This ensures that subsequent initializer seeds are deterministic relative to the first one, even if the global Python random state is modified elsewhere later.
| seed = global_state.get_global_attribute(GLOBAL_DEFAULT_INITIALIZER_SEED) | |
| if seed is None: | |
| seed = get_random_seed() | |
| if seed is not None: | |
| global_state.set_global_attribute( | |
| GLOBAL_DEFAULT_INITIALIZER_SEED, seed + 1 | |
| ) | |
| return seed | |
| return random.randint(1, int(1e9)) | |
| seed = global_state.get_global_attribute(GLOBAL_DEFAULT_INITIALIZER_SEED) | |
| if seed is None: | |
| seed = get_random_seed() | |
| if seed is None: | |
| # Initialize the counter if no global seed is set. | |
| seed = random.randint(1, 2**31 - 1) | |
| global_state.set_global_attribute( | |
| GLOBAL_DEFAULT_INITIALIZER_SEED, seed + 1 | |
| ) | |
| return seed |
|
@hertschuh please review |
hertschuh
left a comment
There was a problem hiding this comment.
What is the context of this? What is the issue this is addressing?
Context: this PR was split out from #22595 after review feedback asked that the mixed PR be separated into smaller PRs. This is the “RNG seed generator changes” slice. It is not part of the original #22472 symbolic validation bug. The purpose is a separate reproducibility cleanup: after
|
Fixes part 3 of #22595
Description
Contributor Agreement
Please check all boxes below before submitting your PR for review: