Skip to content

Commit 21a7d08

Browse files
windreamerCopilot
andauthored
Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent c31225a commit 21a7d08

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

lmdeploy/pytorch/backends/attention.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ class AttentionMetadata:
2020
fill_seqlens: torch.Tensor = None
2121
cu_seqlens_q: torch.Tensor = None
2222
cu_seqlens_k: torch.Tensor = None
23-
quant_policy: QuantPolicy = 0
23+
quant_policy: QuantPolicy = QuantPolicy.NONE
2424

2525

2626
T = TypeVar('T', bound=AttentionMetadata)

lmdeploy/pytorch/kernels/cuda/pagedattention.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -691,8 +691,11 @@ def _get_block_d(Lk):
691691
return BLOCK_DMODEL, BLOCK_DMODEL1, BLOCK_DV
692692

693693
turbo_quant = False
694-
turbo_k_codebook = None
695-
turbo_v_codebook = None
694+
# Triton still receives these arguments for quantized paths, so keep
695+
# valid tensor-backed pointers even when turbo quant is not enabled.
696+
# They will be overwritten with real codebooks when quant_policy == 42.
697+
turbo_k_codebook = q.new_empty((1, ))
698+
turbo_v_codebook = q.new_empty((1, ))
696699

697700
# shape constraints
698701
Lq, Lk, Lv = q.shape[-1], k_cache.shape[d_dim], v_cache.shape[d_dim]

0 commit comments

Comments
 (0)