fix(generation): beam sample when num_beams * vocab_size exceeds multinomial limit by balgaly · Pull Request #45251 · huggingface/transformers

balgaly · 2026-04-05T15:38:58Z

Problem

torch.multinomial rejects last dimensions >= 2**24. Beam search with do_sample=True builds a flat distribution of size num_beams * vocab_size, which can exceed that limit (e.g. large beams + ~164k vocab), crashing during generation (#45245).

Solution

When the flat dimension is at or above 2**24, select beams_to_keep continuations via Gumbel-top-k on accumulated_log_probs, equivalent to multinomial(softmax(logits), k, replacement=False) without using an oversized multinomial.

Tests

tests/generation/test_beam_search_multinomial_limit.py patches the limit to exercise the fallback on small tensors.

Fixes #45245

…inomial limit PyTorch multinomial requires the last dimension to be at most 2**24. Beam search with do_sample flattens num_beams * vocab_size into one dimension; large beams + large vocabs (e.g. huggingface#45245) crash on CUDA. Use Gumbel-top-k when flat_dim >= 2**24, equivalent to sampling without replacement from softmax(accumulated_log_probs). Adds a unit test with a patched limit for small tensors. Made-with: Cursor

github-actions · 2026-04-05T15:53:23Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45251&sha=490973

balgaly · 2026-04-07T21:44:07Z

The tests_hub failure appears unrelated to this PR, tests_generate passed, which covers the changed code in utils.py and the new test in tests/generation/test_beam_search_multinomial_limit.py. Could a maintainer re-run tests_hub when convenient? Happy to address any concerns about the implementation.

Rocketknight1 · 2026-04-08T14:50:40Z

No agent PRs on random issues please! (Swapping a multinomial for a gumbel-topk trick that adds loads of code bloat for a very rare edge case is the most code agent solution imaginable)

ci: retrigger CI to check for transient test failures

a6b7f2c

Rocketknight1 closed this Apr 8, 2026

Rocketknight1 added the Code agent slop label Apr 8, 2026

This was referenced Apr 11, 2026

RuntimeError: number of categories cannot exceed 2^24 #45245

Open

fix(generation): handle CUDA multinomial limit in beam search sampling #45369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(generation): beam sample when num_beams * vocab_size exceeds multinomial limit#45251

fix(generation): beam sample when num_beams * vocab_size exceeds multinomial limit#45251
balgaly wants to merge 2 commits intohuggingface:mainfrom
balgaly:fix/beam-sample-multinomial-flat-dim-limit

balgaly commented Apr 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

balgaly commented Apr 7, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

balgaly commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Tests

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

balgaly commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

balgaly commented Apr 5, 2026 •

edited

Loading

balgaly commented Apr 7, 2026 •

edited

Loading