Skip to content

Ensure CUDA availability b4 setting torch device#13445

Closed
nomadicGopher wants to merge 2 commits intoComfy-Org:masterfrom
nomadicGopher:harden_cpu_support
Closed

Ensure CUDA availability b4 setting torch device#13445
nomadicGopher wants to merge 2 commits intoComfy-Org:masterfrom
nomadicGopher:harden_cpu_support

Conversation

@nomadicGopher
Copy link
Copy Markdown

While working around an issue with outdated CUDA library. I am forcing CPU usage. This helped

Changes

  • fallback to CPU if CUDA is not available

fallback to cpu if nvidia is not available
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

The get_torch_device() function in comfy/model_management.py was modified to include an explicit CUDA availability check. Previously, when no device overrides were active, the function would unconditionally return a CUDA device. The updated logic now verifies CUDA availability before returning a CUDA device; if CUDA is unavailable, it falls back to returning a CPU device instead. This change prevents potential failures when attempting to use CUDA on systems where it is not available.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main change: adding a CUDA availability check before setting the torch device, which aligns with the changeset modifications to get_torch_device().
Description check ✅ Passed The description is related to the changeset, explaining the motivation (outdated CUDA library issue) and the solution (fallback to CPU if CUDA unavailable), matching the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
comfy/model_management.py (1)

206-209: Consider flipping cpu_state to CPUState.CPU at init instead of patching only get_torch_device().

The local fallback here works, but cpu_state stays at CPUState.GPU on a CUDA-less system. That leaves several downstream globals/derivations in an inconsistent state, e.g.:

  • Line 461: if cpu_state != CPUState.GPU: vram_state = VRAMState.DISABLED — won't trigger, so vram_state stays NORMAL_VRAM despite effectively running on CPU.
  • xformers_enabled() (line 1424), cpu_mode() (line 1533), and other cpu_state-gated checks will report GPU mode on a CPU-only box.
  • Any code path that queries CUDA directly (e.g., torch.cuda.get_device_properties in should_use_fp16/should_use_bf16) won't be short-circuited by cpu_mode() and may raise.

A more consistent fix is to force CPU mode up front when no accelerator is present, so every downstream decision flows from the same source of truth:

🛠️ Suggested alternative (near line 152)
 if args.cpu:
     cpu_state = CPUState.CPU
+elif (cpu_state == CPUState.GPU
+      and not torch.cuda.is_available()
+      and not xpu_available
+      and not npu_available
+      and not mlu_available
+      and not ixuca_available
+      and not directml_enabled):
+    logging.warning("No supported accelerator detected; falling back to CPU.")
+    cpu_state = CPUState.CPU

With that in place, the existing if cpu_state == CPUState.CPU: return torch.device("cpu") branch at line 197 already handles get_torch_device() correctly, and the new elif/else at 206–209 becomes unnecessary.

Note: torch.cuda.is_available() is also True for ROCm/HIP builds, so this doesn't regress AMD users — same assumption your current patch relies on.

Worth confirming with the maintainers whether they'd prefer the narrower fix (as in this PR) or the broader cpu_state adjustment, since it touches module-init semantics that custom nodes may observe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/model_management.py` around lines 206 - 209, The module currently falls
back to CPU in get_torch_device() but leaves the global cpu_state as
CPUState.GPU; update the module initialization to detect
torch.cuda.is_available() and set cpu_state = CPUState.CPU when no accelerator
is present (i.e., flip cpu_state to CPUState.CPU at init), so downstream checks
like vram_state logic, xformers_enabled(), cpu_mode(), and
should_use_fp16/should_use_bf16 observe the correct CPU mode; after this change
the existing CPU branch in get_torch_device() will remain valid and the
cuda-based elif/else fallback becomes unnecessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@comfy/model_management.py`:
- Around line 206-209: The module currently falls back to CPU in
get_torch_device() but leaves the global cpu_state as CPUState.GPU; update the
module initialization to detect torch.cuda.is_available() and set cpu_state =
CPUState.CPU when no accelerator is present (i.e., flip cpu_state to
CPUState.CPU at init), so downstream checks like vram_state logic,
xformers_enabled(), cpu_mode(), and should_use_fp16/should_use_bf16 observe the
correct CPU mode; after this change the existing CPU branch in
get_torch_device() will remain valid and the cuda-based elif/else fallback
becomes unnecessary.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b23e0cde-e14d-4383-a31f-db98960039c9

📥 Commits

Reviewing files that changed from the base of the PR and between c033bbf and cbd3ba3.

📒 Files selected for processing (1)
  • comfy/model_management.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants