Skip to content

[Trainer/bug] Ensure model is not inference mode (CORE-72)#13400

Open
KohakuBlueleaf wants to merge 1 commit intoComfy-Org:masterfrom
KohakuBlueleaf:Fix-trainer-inference-tensor
Open

[Trainer/bug] Ensure model is not inference mode (CORE-72)#13400
KohakuBlueleaf wants to merge 1 commit intoComfy-Org:masterfrom
KohakuBlueleaf:Fix-trainer-inference-tensor

Conversation

@KohakuBlueleaf
Copy link
Copy Markdown
Contributor

During recent updates in upstream ComfyUI, the lora trainer node with bypass_mode=True and offloading=False will have error like:

RuntimeError: Inference tensors do not track version counter.

This PR fix this problem by fully rebuild the parameter/buffer from model in training node to avoid that problem.

NOTE, another problem that smart memory/dynamic memory cause invalid access/access violation error is not related to this

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Changes to TrainLoraNode.execute include modifications to handle inference-mode tensor issues. After disabling inference mode, the code now iterates through all submodules and conditionally rebuilds parameters and buffers by replacing entries in module._parameters and module._buffers when their ._version attribute access fails. Additionally, the training-mode setup was modified to call .train() on the model after disabling gradients, ensuring the model enters training mode alongside gradient disabling. These changes affect approximately 23 lines of code.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the problem (inference tensors versioning error) and the solution (rebuilding parameters/buffers), matching the changeset.
Title check ✅ Passed The title accurately describes the main fix: ensuring the model exits inference mode in the LoRA trainer, which directly addresses the core issue described in the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alexisrolland alexisrolland changed the title [Trainer/bug] Ensure model is not inference mode [Trainer/bug] Ensure model is not inference mode (CORE-72) Apr 18, 2026
# which make all parameter is now inference mode tensors
# to make the training correctly working
# we re-build the parameters in training mode
for module in mp.model.modules():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is theoretically sharable amongst training and non-training elements in a workflow so this change is global across all consumers of the shared single model.

Multiple ModelPatchers share the same model, so from that model mp.model should be reasonable immutable.

The good news we recently made this easy to do the deep clone for a few other features.

Do you just need your own full copy of the model? Something like this might do it:

(venv) rattus@rattus-box2:~/ComfyUI$ git diff
diff --git a/comfy/model_patcher.py b/comfy/model_patcher.py
index c9ad8727..858a7a47 100644
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -324,10 +324,11 @@ class ModelPatcher:
     def get_clone_model_override(self):
         return self.model, (self.backup, self.backup_buffers, self.object_patches_backup, self.pinned)
 
-    def clone(self, disable_dynamic=False, model_override=None):
+    def clone(self, disable_dynamic=False, model_override=None, force_deepcopy=False):
         class_ = self.__class__
-        if self.is_dynamic() and disable_dynamic:
-            class_ = ModelPatcher
+        if self.is_dynamic() and disable_dynamic or force_deepcopy:
+            if self.is_dynamic() and disable_dynamic:
+                class_ = ModelPatcher
             if model_override is None:
                 if self.cached_patcher_init is None:
                     raise RuntimeError("Cannot create non-dynamic delegate: cached_patcher_init is not initialized.")

positive = _process_conditioning(positive)

# Setup model and dtype
mp = model.clone()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below for how this clone() might work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants