Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/accelerate/commands/launch.py
Original file line number Diff line number Diff line change
Expand Up @@ -1028,6 +1028,10 @@ def multi_gpu_launcher(args):
console.print_exception(suppress=[__file__], show_locals=False)
else:
raise
else:
if is_xpu_available():
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os._exit() bypasses normal shutdown (finally blocks, atexit handlers, logging/IO flushing). To reduce the chance of losing buffered output, consider explicitly flushing stdout/stderr (and/or logging handlers) immediately before the forced exit, and add a brief inline comment explaining that this is intentionally avoiding oneCCL static destructor teardown on XPU.

Suggested change
if is_xpu_available():
if is_xpu_available():
# Intentionally bypass normal Python shutdown on XPU to avoid oneCCL
# static destructor teardown; flush stdio and logging before forced exit.
sys.stdout.flush()
sys.stderr.flush()
logging.shutdown()

Copilot uses AI. Check for mistakes.
import os as _os
_os._exit(0)
Comment on lines +1033 to +1034
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling os._exit(0) here will unconditionally terminate the entire Python process whenever multi_gpu_launcher() completes successfully on an XPU system. That’s risky when launch_command()/multi_gpu_launcher() are invoked programmatically (e.g., from a larger Python process or test runner), since it can prematurely end the host process and skip remaining work/tests. Consider gating this behavior behind an explicit CLI-only signal (e.g., an env var set by the accelerate entrypoint, or a dedicated argument/flag) so library/test usage can return normally while the CLI path still avoids the oneCCL teardown crash.

Suggested change
import os as _os
_os._exit(0)
# On XPU systems, a hard process exit is used to avoid a known oneCCL teardown crash.
# This can be disabled for library/test usage by setting the environment variable
# ACCELERATE_DISABLE_XPU_PROCESS_EXIT to a truthy value (e.g. "1", "true", "yes").
disable_exit = os.environ.get("ACCELERATE_DISABLE_XPU_PROCESS_EXIT", "").lower()
if disable_exit not in {"1", "true", "yes"}:
import os as _os
_os._exit(0)

Copilot uses AI. Check for mistakes.


def deepspeed_launcher(args):
Expand Down
Loading