Skip to content

Fix PP TMA stage reuse race#660

Open
qescccczmr wants to merge 1 commit into
deepseek-ai:mainfrom
qescccczmr:fix-pp-tma-store-wait
Open

Fix PP TMA stage reuse race#660
qescccczmr wants to merge 1 commit into
deepseek-ai:mainfrom
qescccczmr:fix-pp-tma-store-wait

Conversation

@qescccczmr

Copy link
Copy Markdown

Fix a PP send/recv TMA copy race when reusing a shared-memory pipeline stage.
This changes the reuse wait to ptx::tma_store_wait() / wait_group 0, ensuring all outstanding TMA
stores complete before the same stage buffer is reused.
Fixes #657.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DeepEP V2] tma_copy data race

1 participant