new feature: On policy distillation by sfc-gh-thonguyen · Pull Request #346 · snowflakedb/ArcticTraining

sfc-gh-thonguyen · 2026-01-28T18:23:02Z

(Creating a duplicate of #344 as that one was from a fork branch, causing the GPU modal test failure)

Based on this blog post https://thinkingmachines.ai/blog/on-policy-distillation/ -- figured Arctic Training would be an appropriate place to have this feature.

Training validated with GSM8K dataset on Qwen3-1.7B model using Qwen3-8B teacher.

Lower teacher perplexity means teacher is less surprised by the student's answer. Higher teacher logprob means teacher agrees with the student's answer.

Lower reverse KL and logprob gap mean student's answers converge to teacher's.

Full dashboard: https://snowflake.wandb.io/thongnguyen/on-policy-distillation-gsm8k/runs/zuwzrd11?nw=nwuserthongnguyen

Once this PR is in we can make the claim ArcticTraining supports RL :)

sfc-gh-thonguyen · 2026-01-28T18:39:33Z

Closed this PR as the original one #344 has modal test unblocked.

sfc-gh-thonguyen added 5 commits January 28, 2026 03:39

Add on policy distillation

796db60

fix flake8

949a91b

remove redundant changes

d7e5e61

minor fix

2e5b0e7

minor fix

d590efc

sfc-gh-thonguyen requested a review from sfc-gh-jrasley as a code owner January 28, 2026 18:23

sfc-gh-thonguyen requested review from sfc-gh-mwyatt and removed request for sfc-gh-jrasley January 28, 2026 18:23

sfc-gh-thonguyen closed this Jan 28, 2026

optimize generator

aa563ca

sfc-gh-thonguyen reopened this Jan 28, 2026

sfc-gh-thonguyen closed this Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new feature: On policy distillation#346

new feature: On policy distillation#346
sfc-gh-thonguyen wants to merge 6 commits into
mainfrom
thong/on_policy_distillation

sfc-gh-thonguyen commented Jan 28, 2026

Uh oh!

sfc-gh-thonguyen commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sfc-gh-thonguyen commented Jan 28, 2026

Uh oh!

sfc-gh-thonguyen commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant