Skip to content

Fix mtp #4517

Open
RunningLeon wants to merge 3 commits intoInternLM:mainfrom
RunningLeon:fix-mtp
Open

Fix mtp #4517
RunningLeon wants to merge 3 commits intoInternLM:mainfrom
RunningLeon:fix-mtp

Conversation

@RunningLeon
Copy link
Copy Markdown
Collaborator

Motivation

Modification

Fix mtp:

  • duplicate adding 1 to mrope_ids in lmdeploy/pytorch/spec_decode/spec_agent.py
  • not using target_hidden_states for second spec decoding step in lmdeploy/pytorch/spec_decode/proposers/deepseek_mtp.py

improvement

  • update logits sampling related codes

oc results

dataset version metric mode eval_qwen35-mtp3
GPQA_diamond_repeat_4 772ea0 accuracy (4 runs average) gen 82.70
- - - -
Math Calculation - - - -
aime2025_repeat_32 5e9f4f accuracy (32 runs average) gen 89.06

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Copilot AI review requested due to automatic review settings April 10, 2026 06:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes DeepSeek MTP speculative decoding correctness issues and refactors speculative sampling/logits processing to reduce duplicated logic and align shapes/flows between prefill vs decoding.

Changes:

  • Simplifies AR-spec extra-input slicing so target_logits is passed through in the model’s native flattened form.
  • Refactors speculative rejection sampling to run FusedLogitsProcessor inline (and removes the dedicated async_sampling_logits helper), plus fixes a duplicate mrope_pos_ids += 1 increment.
  • Fixes DeepSeek MTP proposer to propagate target_hidden_states for subsequent speculative decoding steps (instead of reusing draft hidden states).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
lmdeploy/pytorch/strategies/ar_spec/model_agent.py Adjusts target_logits slicing to pass through flattened logits directly.
lmdeploy/pytorch/spec_decode/spec_agent.py Refactors logits processing within _rejection_sampling, removes duplicate mRoPE increment, and adds profiling scope.
lmdeploy/pytorch/spec_decode/proposers/deepseek_mtp.py Uses model_inputs.target_hidden_states (sliced by last-token indices) for MTP continuation steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lvhan028 lvhan028 requested a review from grimoire April 10, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants