Add tutorial: DP fine-tuning of causal LMs with LoRA by immu4989 · Pull Request #830 · meta-pytorch/opacus

immu4989 · 2026-06-25T05:38:01Z

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Docs change / refactoring / dependency upgrade

Motivation and Context / Related issue

Closes #827.

Opacus has one LoRA tutorial today (tutorials/building_text_classifier.ipynb), covering BERT for sequence classification. There's no tutorial for the more common contemporary case: causal language model fine-tuning with DP-SGD + LoRA. This PR adds that as tutorials/building_dp_lora_for_llms.ipynb.

The tutorial also bakes in the silent-corruption avoidance pattern surfaced and root-caused in #820. Three independent confirmations across CPU, Kaggle T4, and RTX 5090 settled it as a device-placement ordering issue (model.to(device) before get_peft_model()). The tutorial demonstrates the safe ordering explicitly with inline rationale and links back to #820.

What's in the tutorial

E2E NLG as the downstream task (small, standard in DP-NLP, used in the DiSK paper benchmark suite)
GPT-2-small as the base model
Two configurations compared: non-DP LoRA baseline vs DP LoRA at target epsilon=8
BLEU, perplexity, throughput (tok/s), peak GPU memory, and final epsilon reported for each
A dedicated "three safety patterns" section walking through:
1. model.to(device) before get_peft_model() (Silent corrupted LoRA weight updates with Opacus 1.5.4 + PEFT 0.18.x: training appears normal but models are unusable #820 ordering)
2. model.train() before make_private_with_epsilon (validator requirement)
3. poisson_sampling=False (avoids GPT-2 forward-pass failure on empty Poisson batches)
Honest discussion of when to reach for DP-LoRA, the learning-rate-needs-to-be-higher finding, and the engineering recipe to enable DP-full in a follow-up PR

Canonical results on Kaggle T4 (embedded in the notebook)

Config	BLEU	PPL	tok/s	Peak GB	Time	epsilon
non-DP LoRA	24.72	1.60	4679	2.17	438 s	no DP
DP LoRA	18.24	2.71	4599	2.65	445 s	7.08

DP costs about 26% relative BLEU at epsilon ~= 7 on this task, with about 22% memory overhead and negligible throughput impact.

What's intentionally out of scope

DP-full fine-tuning of GPT-2 with all 125M parameters trainable. Attempted via opacus.validators.ModuleValidator.fix() (Conv1D -> Linear swap) plus grad_sample_mode='functorch'; neither resolved a per-sample-gradient shape mismatch in clip_and_accumulate that appears to stem from GPT-2's tied wte / lm_head weights. The notebook's section 16 documents a recommended engineering recipe for a follow-up PR (untie weights, then re-apply fix and functorch). Felt cleaner to ship the LoRA portion as scoped here than to block on the DP-full work.

How Has This Been Tested

End-to-end run on Kaggle T4 with pinned versions:

opacus 1.6.0
peft 0.18.1
transformers 5.0.0
torch 2.10.0+cu128

Cell outputs in the committed notebook reflect the actual run. The three safety patterns each addressed a real failure mode encountered during development.

Checklist

The documentation is up-to-date with the changes I made.
I have read the CONTRIBUTING document and completed the CLA (CLA on file from PR Fix batch_first paths in DPMultiheadAttention #825).
All tests passed, and additional code has been covered with new tests.

Step-by-step tutorial covering GPT-2 + LoRA + opacus on E2E NLG. Compares non-DP LoRA baseline against DP LoRA at target epsilon=8 and reports BLEU, perplexity, peak GPU memory, throughput, and final epsilon. Includes a dedicated section on the three device-placement and training-mode ordering patterns that prevent silent corruption (the bug surfaced and root-caused in meta-pytorch#820). Notebook is fully executed; cell outputs are embedded. Closes meta-pytorch#827

meta-codesync · 2026-06-25T05:40:41Z

This pull request has been imported. If you are a Meta employee, you can view this in D109661283. (Because this pull request was imported automatically, there will not be any future comments.)

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tutorial: DP fine-tuning of causal LMs with LoRA#830

Add tutorial: DP fine-tuning of causal LMs with LoRA#830
immu4989 wants to merge 1 commit into
meta-pytorch:mainfrom
immu4989:tutorial/dp-lora-for-llms

immu4989 commented Jun 25, 2026

Uh oh!

meta-codesync Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

immu4989 commented Jun 25, 2026

Types of changes

Motivation and Context / Related issue

What's in the tutorial

Canonical results on Kaggle T4 (embedded in the notebook)

What's intentionally out of scope

How Has This Been Tested

Checklist

Uh oh!

meta-codesync Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant