Skip to content

[Rebase] 0611: Rebase upstream TileLang#60

Merged
Rachmanino merged 32 commits into
v0-rebasefrom
rebase/v0-dev-tilelang-550e25d
Jun 12, 2026
Merged

[Rebase] 0611: Rebase upstream TileLang#60
Rachmanino merged 32 commits into
v0-rebasefrom
rebase/v0-dev-tilelang-550e25d

Conversation

@Rachmanino

Copy link
Copy Markdown
Collaborator

No description provided.

Rachmanino added 27 commits June 5, 2026 11:54
Compile-once API and tests:

- Add compile_once support to tilelang.compile and tilelang.jit with optional compile_group and compile_root controls.

- Coordinate root-first compilation through torch.distributed and propagate root compile failures to all ranks instead of hanging peers.

- Add distributed compile_once coverage for ordering, nonzero roots, real compile/jit paths, and root failure propagation.

Distributed examples and tests:

- Enable compile_once in distributed primitive tests and intranode distributed examples so spawned ranks reuse the normal cache path after root compilation.

- Pass explicit compile_group for direct tilelang.compile calls inside distributed workers; use decorator-level compile_once for JIT kernels.

- Keep import-time warmup compiles and non-distributed single-GPU tests unchanged; remove cache disabling from examples where it would prevent reuse.
Implement src_pe/dst_pe support for T.copy/T.tma_copy, including 1D TMA address remapping, SIMT cp_block fast path, predicated edge-tile fallback, and explicit rejection for unsupported descriptor TMA remote copies.

Factor symmetric remote address remapping into src/op/distributed_utils.h and reuse it from copy, remote copy, and sync lowering.

Add distributed tests for remote copy, remote TMA load/store, SIMT fallback, descriptor rejection, and compile-once usage.
Lower bf16/fp16 multimem ops directly to packed x2 instructions instead of relying on ParallelOp vector inference. Add standalone multimem allreduce strategies and an SM-specialized GEMM allreduce example with tests. Align GEMM-AR reset benchmarking with TK by timing device epilogue launches without host barriers.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff09e894-b8b6-43a2-a404-75a305f879ea

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rebase/v0-dev-tilelang-550e25d

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@Rachmanino Rachmanino force-pushed the rebase/v0-dev-tilelang-550e25d branch 2 times, most recently from a345e41 to 18a30df Compare June 12, 2026 06:43
@Rachmanino Rachmanino merged commit 81ad235 into v0-rebase Jun 12, 2026
1 of 2 checks passed
@Rachmanino Rachmanino deleted the rebase/v0-dev-tilelang-550e25d branch June 12, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant