Skip to content

Fix state_dict handling for max_iters in Engine#3729

Open
TahaZahid05 wants to merge 8 commits intopytorch:masterfrom
TahaZahid05:fix-taha/max-iters
Open

Fix state_dict handling for max_iters in Engine#3729
TahaZahid05 wants to merge 8 commits intopytorch:masterfrom
TahaZahid05:fix-taha/max-iters

Conversation

@TahaZahid05
Copy link
Copy Markdown
Collaborator

Fixes #1521

Description:
This PR builds up on #3439 to implement max_iters handling in state serialization and deserialization during Engine runs.

Key Changes:

  • Engine.state_dict() now correctly exports exactly one of max_iters or max_epochs depending on which condition the Engine run was configured with.
  • Reconstructed _state_dict_one_of_opt_keys to accept groups of mutually exclusive requirements, enabling Engine.load_state_dict() to cleanly accept either iteration/max_iters or epoch/max_epochs and correctly continue the engine's resume state from either.
  • Added validation checks directly to Engine to prevent loading impossible future iterations while safely resuming training.
  • Added local execution and Checkpointer tests that verify mutually exclusive cross-parameter resumption.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions bot added module: engine Engine module module: base Base module labels Apr 10, 2026
@TahaZahid05
Copy link
Copy Markdown
Collaborator Author

@vfdev-5 done!

More descriptive way to show tuple of tuples

Co-authored-by: vfdev <vfdev.5@gmail.com>
@vfdev-5
Copy link
Copy Markdown
Collaborator

vfdev-5 commented Apr 12, 2026 via email

@TahaZahid05 TahaZahid05 requested a review from vfdev-5 April 12, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: base Base module module: engine Engine module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible issues with max_iters when loading/saving engine's state

3 participants