[FSDP2] Auto-exclude non-floating frozen Params4bit from fully_shard to prevent QLoRA crash by roycho96 · Pull Request #3987 · huggingface/accelerate

roycho96 · 2026-03-21T14:35:19Z

What does this PR do?

Auto-excludes incompatible BnB Params4bit from FSDP2 sharding via ignored_params to prevent QLoRA crash.

Fixes #3983

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SunMarc

SunMarc

Do we get the same results as with bnb_4bit_quant_storage being a float ? Did you test on a e2e example to compare both ? Thanks a lot !

roycho96 · 2026-03-25T15:45:35Z

Do we get the same results as with bnb_4bit_quant_storage being a float ? Did you test on a e2e example to compare both ? Thanks a lot !

@SunMarc Thanks for the review!

I ran an e2e comparison on single GPU (Qwen3-0.6B + QLoRA + SFT, same seed/hyperparams):

Step	uint8 (this PR)	bf16 quant_storage (official)
1	1.2914	1.2914
2	1.2571	1.2516
3	1.2310	1.2281
4	1.1131	1.1202
5	1.0541	1.0563
6	0.9757	0.9793
7	0.9656	0.9687
8	0.9125	0.9066
avg	1.1001	1.1003

Loss curves are nearly identical (< 0.6% difference at any step).
The minor variations are expected from floating-point ordering
differences in the two quant_storage representations.
The dequantized values are equivalent.

SunMarc

Another question I have is that if we put those into ignored param, will fsdp stil manage to shard correctly the model across multiple gpus since most of params are Params4bit ?

roycho96 · 2026-03-27T15:55:05Z

Another question I have is that if we put those into ignored param, will fsdp stil manage to shard correctly the model across multiple gpus since most of params are Params4bit ?

You're right. With most parameters being Params4bit and excluded from sharding, the base weights are replicated on each GPU. FSDP still shards the trainable LoRA parameters and gradients, but the memory benefit from weight sharding is limited in this path.
The primary goal of this PR is crash prevention. Currently, users with default QLoRA config (bnb_4bit_quant_storage unset) get a crash on FSDP2. I think this is a common QLoRA setup.
For users who need full base weight sharding (e.g. 70B+ models), the path with bnb_4bit_quant_storage=bf16 remains the right choice and is unaffected by this PR.

src/accelerate/utils/fsdp_utils.py

SunMarc

Just a nit ! Thanks

SunMarc · 2026-04-09T16:28:12Z

src/accelerate/utils/fsdp_utils.py

        if param.__class__.__name__ == "Params4bit":
-            model_has_params4bit = True
-            break
+            params4bit.append(param)
+
+    model_has_params4bit = len(params4bit) > 0


let's not append all param in params4bit. let's keep model_has_params4bit as we did before

let's not append all param in params4bit. let's keep model_has_params4bit as we did before

Done. Reverted to model_has_params4bit flag with early detection, and moved the incompatible filtering into the same loop to avoid a second pass over parameters.

tests/fsdp/test_fsdp.py

SunMarc

Thanks !

HuggingFaceDocBuilderDev · 2026-04-10T13:59:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

roycho96 added 2 commits March 21, 2026 23:32

fix: auto-exclude non-floating frozen Params4bit

7e9d98a

add test

8669653

SunMarc reviewed Mar 23, 2026

View reviewed changes

SunMarc reviewed Mar 27, 2026

View reviewed changes

SunMarc reviewed Mar 30, 2026

View reviewed changes

src/accelerate/utils/fsdp_utils.py Outdated Show resolved Hide resolved

add warning and remove logger patch in test file

7485275

SunMarc reviewed Apr 9, 2026

View reviewed changes

fix not to append all param in params4bit and remove tests

57fd3ba

SunMarc approved these changes Apr 10, 2026

View reviewed changes

SunMarc merged commit 5adf855 into huggingface:main Apr 10, 2026
24 of 25 checks passed

roycho96 deleted the fix/fsdp2-auto-ignore branch April 10, 2026 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP2] Auto-exclude non-floating frozen Params4bit from fully_shard to prevent QLoRA crash#3987

[FSDP2] Auto-exclude non-floating frozen Params4bit from fully_shard to prevent QLoRA crash#3987
SunMarc merged 4 commits intohuggingface:mainfrom
roycho96:fix/fsdp2-auto-ignore

roycho96 commented Mar 21, 2026

Uh oh!

SunMarc left a comment

Uh oh!

roycho96 commented Mar 25, 2026

Uh oh!

SunMarc left a comment

Uh oh!

roycho96 commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

roycho96 Apr 9, 2026

Uh oh!

SunMarc Apr 10, 2026

Uh oh!

Uh oh!

SunMarc left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

roycho96 commented Mar 21, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

roycho96 commented Mar 25, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

roycho96 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

roycho96 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roycho96 commented Mar 27, 2026 •

edited

Loading