[Trainer/bug] Ensure model is not inference mode (CORE-72)#13400
[Trainer/bug] Ensure model is not inference mode (CORE-72)#13400KohakuBlueleaf wants to merge 1 commit intoComfy-Org:masterfrom
Conversation
📝 WalkthroughWalkthroughChanges to 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| # which make all parameter is now inference mode tensors | ||
| # to make the training correctly working | ||
| # we re-build the parameters in training mode | ||
| for module in mp.model.modules(): |
There was a problem hiding this comment.
The model is theoretically sharable amongst training and non-training elements in a workflow so this change is global across all consumers of the shared single model.
Multiple ModelPatchers share the same model, so from that model mp.model should be reasonable immutable.
The good news we recently made this easy to do the deep clone for a few other features.
Do you just need your own full copy of the model? Something like this might do it:
(venv) rattus@rattus-box2:~/ComfyUI$ git diff
diff --git a/comfy/model_patcher.py b/comfy/model_patcher.py
index c9ad8727..858a7a47 100644
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -324,10 +324,11 @@ class ModelPatcher:
def get_clone_model_override(self):
return self.model, (self.backup, self.backup_buffers, self.object_patches_backup, self.pinned)
- def clone(self, disable_dynamic=False, model_override=None):
+ def clone(self, disable_dynamic=False, model_override=None, force_deepcopy=False):
class_ = self.__class__
- if self.is_dynamic() and disable_dynamic:
- class_ = ModelPatcher
+ if self.is_dynamic() and disable_dynamic or force_deepcopy:
+ if self.is_dynamic() and disable_dynamic:
+ class_ = ModelPatcher
if model_override is None:
if self.cached_patcher_init is None:
raise RuntimeError("Cannot create non-dynamic delegate: cached_patcher_init is not initialized.")
| positive = _process_conditioning(positive) | ||
|
|
||
| # Setup model and dtype | ||
| mp = model.clone() |
There was a problem hiding this comment.
see comment below for how this clone() might work
During recent updates in upstream ComfyUI, the lora trainer node with bypass_mode=True and offloading=False will have error like:
This PR fix this problem by fully rebuild the parameter/buffer from model in training node to avoid that problem.
NOTE, another problem that smart memory/dynamic memory cause invalid access/access violation error is not related to this