Skip to content

Add tutorial: DP fine-tuning of causal LMs with LoRA#830

Open
immu4989 wants to merge 1 commit into
meta-pytorch:mainfrom
immu4989:tutorial/dp-lora-for-llms
Open

Add tutorial: DP fine-tuning of causal LMs with LoRA#830
immu4989 wants to merge 1 commit into
meta-pytorch:mainfrom
immu4989:tutorial/dp-lora-for-llms

Conversation

@immu4989

Copy link
Copy Markdown
Contributor

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Docs change / refactoring / dependency upgrade

Motivation and Context / Related issue

Closes #827.

Opacus has one LoRA tutorial today (tutorials/building_text_classifier.ipynb), covering BERT for sequence classification. There's no tutorial for the more common contemporary case: causal language model fine-tuning with DP-SGD + LoRA. This PR adds that as tutorials/building_dp_lora_for_llms.ipynb.

The tutorial also bakes in the silent-corruption avoidance pattern surfaced and root-caused in #820. Three independent confirmations across CPU, Kaggle T4, and RTX 5090 settled it as a device-placement ordering issue (model.to(device) before get_peft_model()). The tutorial demonstrates the safe ordering explicitly with inline rationale and links back to #820.

What's in the tutorial

  • E2E NLG as the downstream task (small, standard in DP-NLP, used in the DiSK paper benchmark suite)
  • GPT-2-small as the base model
  • Two configurations compared: non-DP LoRA baseline vs DP LoRA at target epsilon=8
  • BLEU, perplexity, throughput (tok/s), peak GPU memory, and final epsilon reported for each
  • A dedicated "three safety patterns" section walking through:
    1. model.to(device) before get_peft_model() (Silent corrupted LoRA weight updates with Opacus 1.5.4 + PEFT 0.18.x: training appears normal but models are unusable #820 ordering)
    2. model.train() before make_private_with_epsilon (validator requirement)
    3. poisson_sampling=False (avoids GPT-2 forward-pass failure on empty Poisson batches)
  • Honest discussion of when to reach for DP-LoRA, the learning-rate-needs-to-be-higher finding, and the engineering recipe to enable DP-full in a follow-up PR

Canonical results on Kaggle T4 (embedded in the notebook)

Config BLEU PPL tok/s Peak GB Time epsilon
non-DP LoRA 24.72 1.60 4679 2.17 438 s no DP
DP LoRA 18.24 2.71 4599 2.65 445 s 7.08

DP costs about 26% relative BLEU at epsilon ~= 7 on this task, with about 22% memory overhead and negligible throughput impact.

What's intentionally out of scope

DP-full fine-tuning of GPT-2 with all 125M parameters trainable. Attempted via opacus.validators.ModuleValidator.fix() (Conv1D -> Linear swap) plus grad_sample_mode='functorch'; neither resolved a per-sample-gradient shape mismatch in clip_and_accumulate that appears to stem from GPT-2's tied wte / lm_head weights. The notebook's section 16 documents a recommended engineering recipe for a follow-up PR (untie weights, then re-apply fix and functorch). Felt cleaner to ship the LoRA portion as scoped here than to block on the DP-full work.

How Has This Been Tested

End-to-end run on Kaggle T4 with pinned versions:

  • opacus 1.6.0
  • peft 0.18.1
  • transformers 5.0.0
  • torch 2.10.0+cu128

Cell outputs in the committed notebook reflect the actual run. The three safety patterns each addressed a real failure mode encountered during development.

Checklist

  • The documentation is up-to-date with the changes I made.
  • I have read the CONTRIBUTING document and completed the CLA (CLA on file from PR Fix batch_first paths in DPMultiheadAttention #825).
  • All tests passed, and additional code has been covered with new tests.

Step-by-step tutorial covering GPT-2 + LoRA + opacus on E2E NLG.
Compares non-DP LoRA baseline against DP LoRA at target epsilon=8 and
reports BLEU, perplexity, peak GPU memory, throughput, and final
epsilon.

Includes a dedicated section on the three device-placement and
training-mode ordering patterns that prevent silent corruption (the
bug surfaced and root-caused in meta-pytorch#820).

Notebook is fully executed; cell outputs are embedded.

Closes meta-pytorch#827
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2026
@meta-codesync

meta-codesync Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request has been imported. If you are a Meta employee, you can view this in D109661283. (Because this pull request was imported automatically, there will not be any future comments.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tutorial: DP fine-tuning of causal language models with LoRA

1 participant