[FSDP2] Auto-exclude non-floating frozen Params4bit from fully_shard to prevent QLoRA crash#3987
Conversation
SunMarc
left a comment
There was a problem hiding this comment.
Do we get the same results as with bnb_4bit_quant_storage being a float ? Did you test on a e2e example to compare both ? Thanks a lot !
@SunMarc Thanks for the review! I ran an e2e comparison on single GPU (Qwen3-0.6B + QLoRA + SFT, same seed/hyperparams):
Loss curves are nearly identical (< 0.6% difference at any step). |
SunMarc
left a comment
There was a problem hiding this comment.
Another question I have is that if we put those into ignored param, will fsdp stil manage to shard correctly the model across multiple gpus since most of params are Params4bit ?
You're right. With most parameters being Params4bit and excluded from sharding, the base weights are replicated on each GPU. FSDP still shards the trainable LoRA parameters and gradients, but the memory benefit from weight sharding is limited in this path. |
src/accelerate/utils/fsdp_utils.py
Outdated
| if param.__class__.__name__ == "Params4bit": | ||
| model_has_params4bit = True | ||
| break | ||
| params4bit.append(param) | ||
|
|
||
| model_has_params4bit = len(params4bit) > 0 |
There was a problem hiding this comment.
let's not append all param in params4bit. let's keep model_has_params4bit as we did before
There was a problem hiding this comment.
let's not append all param in params4bit. let's keep model_has_params4bit as we did before
Done. Reverted to model_has_params4bit flag with early detection, and moved the incompatible filtering into the same loop to avoid a second pass over parameters.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Auto-excludes incompatible BnB
Params4bitfrom FSDP2 sharding viaignored_paramsto prevent QLoRA crash.Fixes #3983
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@SunMarc