[Cpp API Compatibility] Fix flashmla compile by youge325 · Pull Request #78550 · PaddlePaddle/Paddle

youge325 · 2026-04-01T11:12:49Z

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

拆分自 #78484

修复编译 FlashMLA 遇到的 redefinition 错误。如果要用 at::cuda::is_available()，需要统一使用 torch::cuda::is_available()，移除 torch/cuda.h 中重复导出到 at::cuda 命名空间的定义。

变更详情

1. 修复 torch/cuda.h 头文件问题

namespace at::cuda {
- using torch::cuda::device_count;
- using torch::cuda::is_available;
  using torch::cuda::synchronize;
}

原代码将 torch::cuda::is_available() 重复导出到 at::cuda 命名空间，导致与 ATen/cuda/CUDAContext.h 中的定义冲突。FlashMLA 使用 at::cuda::is_available() 时会触发编译错误。

修复策略是统一入口：

不再在 torch/cuda.h 中导出 is_available 到 at::cuda
推荐使用 torch::cuda::is_available() 作为跨库兼容入口

2. 更新测试代码使用 torch::cuda::is_available()

修改以下测试文件：

test/cpp/compat/ATen_TensorAccessor_test.cc
test/cpp/compat/compat_basic_test.cc
test/cpp/compat/c10_storage_test.cc

3. 修复 Stream 相关接口 (c10/core/Stream.h, c10/core/Stream.cpp)

对齐 Stream 的比较运算符行为
修复 native_handle() 在 CPU 设备上的异常抛出语义

4. 精简 ScalarType 实现

暂时移除部分尚未完全对齐的 ScalarType 相关代码（将在后续 PR 中重新引入）。

是否引起精度变化

否

paddle-bot · 2026-04-01T11:13:18Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copilot

Pull request overview

This PR updates Paddle’s C++ API compatibility layer to improve PyTorch-compat behavior and fix compilation issues encountered by FlashMLA (split out from #78484).

Changes:

Removes torch/cuda.h re-exports of device_count / is_available into at::cuda to avoid symbol/redefinition conflicts.
Refactors c10::ScalarType enum/utilities and trims related unit tests.
Aligns some compat behaviors/messages (e.g., c10::Stream printing, added bounds/defined checks in a few ATen ops).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/cpp/compat/c10_ScalarType_test.cc	Removes ScalarType utility branch tests (and an unused include).
paddle/phi/api/include/compat/torch/csrc/api/include/torch/cuda.h	Stops re-exporting `device_count` / `is_available` into `at::cuda`.
paddle/phi/api/include/compat/c10/core/Stream.h	Changes `c10::Stream` `operator<<` formatting to a PyTorch-like form.
paddle/phi/api/include/compat/c10/core/Stream.cpp	Changes unsupported `native_handle()` failure path/message construction.
paddle/phi/api/include/compat/c10/core/ScalarType.h	Refactors ScalarType enum generation and removes several explicit utility branches.
paddle/phi/api/include/compat/ATen/ops/std.h	Adds dimension-range validation for `std` implementation.
paddle/phi/api/include/compat/ATen/ops/select.h	Adds explicit dim/index range checks and negative index normalization.
paddle/phi/api/include/compat/ATen/ops/equal.h	Adds defined-tensor checks before comparing tensors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

paddle/phi/api/include/compat/c10/core/ScalarType.h

Copilot · 2026-04-01T11:23:27Z

paddle/phi/api/include/compat/c10/core/ScalarType.h

 inline const char* toString(ScalarType t) {
 #define DEFINE_CASE(_1, _2, name) \
  case ScalarType::name:          \
    return #name;

  switch (t) {
    AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS(DEFINE_CASE)
-    case ScalarType::QInt8:
-      return "QInt8";
-    case ScalarType::QUInt8:
-      return "QUInt8";
-    case ScalarType::QInt32:
-      return "QInt32";
-    case ScalarType::QUInt4x2:
-      return "QUInt4x2";
-    case ScalarType::QUInt2x4:
-      return "QUInt2x4";
-    case ScalarType::ComplexHalf:
-      return "ComplexHalf";
-    case ScalarType::Bits1x8:
-      return "Bits1x8";
-    case ScalarType::Bits2x4:
-      return "Bits2x4";
-    case ScalarType::Bits4x2:
-      return "Bits4x2";
-    case ScalarType::Bits8:
-      return "Bits8";
-    case ScalarType::Bits16:
-      return "Bits16";
-    case ScalarType::Float8_e5m2fnuz:
-      return "Float8_e5m2fnuz";
-    case ScalarType::Float8_e4m3fnuz:
-      return "Float8_e4m3fnuz";
-    case ScalarType::Float8_e8m0fnu:
-      return "Float8_e8m0fnu";
-    case ScalarType::Float4_e2m1fn_x2:
-      return "Float4_e2m1fn_x2";
-    case ScalarType::Undefined:
-      return "Undefined";
    default:
      return "UNKNOWN_SCALAR";
  }
 #undef DEFINE_CASE
 }

 inline size_t elementSize(ScalarType t) {
 #define CASE_ELEMENTSIZE_CASE(ctype, _2, name) \
  case ScalarType::name:                       \
    return sizeof(ctype);

  switch (t) {
    AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS(CASE_ELEMENTSIZE_CASE)
-    case ScalarType::QInt8:
-    case ScalarType::QUInt8:
-    case ScalarType::QUInt4x2:
-    case ScalarType::QUInt2x4:
-    case ScalarType::Bits1x8:
-    case ScalarType::Bits2x4:
-    case ScalarType::Bits4x2:
-    case ScalarType::Bits8:
-      return 1;
-    case ScalarType::QInt32:
-    case ScalarType::Bits16:
-      return 4;
    default:
      TORCH_CHECK(false, "Unknown ScalarType");
  }


ScalarType still declares the quantized enums via AT_FORALL_QINT_TYPES(...), but toString() and elementSize() no longer handle QInt8/QUInt8/QInt32/QUInt4x2/QUInt2x4 (the switches only expand AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS). This is a behavioral regression: printing these scalar types will return "UNKNOWN_SCALAR", and elementSize() will throw. Add QInt cases back (e.g., by also expanding AT_FORALL_QINT_TYPES in these switches) and ensure their element sizes match the intended semantics.

paddle/phi/api/include/compat/c10/core/ScalarType.h

Copilot · 2026-04-01T11:23:28Z

paddle/phi/api/include/compat/c10/core/Stream.cpp

+  PD_CHECK(false,
+           "native_handle() is not supported for this device type (",


PD_CHECK(false, ...) will throw a PD_Exception that appends an additional "Expected false, but it's not satisfied." context. If the goal is to match PyTorch's error message format exactly (as the comment states), this will not produce an exact match. Prefer throwing directly (e.g., TORCH_CHECK(false, ...) / PD_THROW(...)) or pass the real condition into PD_CHECK(...) instead of false so the extra context remains meaningful.

Suggested change

PD_CHECK(false,

"native_handle() is not supported for this device type (",

PD_THROW("native_handle() is not supported for this device type (",

Copilot · 2026-04-01T11:23:28Z

paddle/phi/api/include/compat/ATen/ops/select.h

+  if (dim < 0 || dim >= self.dim()) {
+    PD_CHECK(false,
+             "select(): index ",
+             orig_dim,
+             " out of range for tensor of size ",
+             self.sizes(),
+             " at dimension ",
+             orig_dim);
+  }
+  // Handle negative index
+  int64_t orig_index = index;
  if (index < 0) {
-    int64_t dim_size = self.size(dim);
-    index = dim_size + index;
+    index = self.size(dim) + index;
+  }
+  // Check index is valid
+  if (index < 0 || index >= self.size(dim)) {
+    PD_CHECK(false,
+             "select(): index ",
+             orig_index,
+             " out of range for tensor of size ",
+             self.sizes(),
+             " at dimension ",
+             orig_dim < 0 ? orig_dim + self.dim() : orig_dim);
  }


New error handling was added for invalid dim / index values, but the existing select tests (e.g., test/cpp/compat/ATen_select_test.cc) don't cover these out-of-range branches. Add test cases that assert an exception is thrown for (1) dim out of range (including negative beyond -self.dim()), and (2) index out of range (including negative beyond -size(dim)).

codecov-commenter · 2026-04-01T16:26:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@53326c8). Learn more about missing BASE report.

Additional details and impacted files

@@             Coverage Diff             @@
##             develop    #78550   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         2           
  Lines              ?         6           
  Branches           ?         0           
===========================================
  Hits               ?         6           
  Misses             ?         0           
  Partials           ?         0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SigureMo

LGTMeow

youge325 added 2 commits April 1, 2026 19:04

revert useless ScalarType first

d39c77c

resolve some mismatch, and fix compling error in FlashMLA

bed6e56

Copilot AI review requested due to automatic review settings April 1, 2026 11:12

paddle-bot bot added the contributor External developers label Apr 1, 2026

Copilot started reviewing on behalf of youge325 April 1, 2026 11:13 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

fix: update tests to use torch::cuda::is_available

1e7a992

youge325 added 2 commits April 2, 2026 22:01

Merge branch 'develop' into fix-flashmla-compile

555266f

revert ScalarType

236217e

SigureMo approved these changes Apr 3, 2026

View reviewed changes

SigureMo merged commit cebfafd into PaddlePaddle:develop Apr 3, 2026
82 of 83 checks passed

youge325 deleted the fix-flashmla-compile branch April 3, 2026 01:39

liuhao2638 pushed a commit to liuhao2638/Paddle that referenced this pull request Apr 7, 2026

[Cpp API Compatibility] Fix flashmla compile (PaddlePaddle#78550)

8946a32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cpp API Compatibility] Fix flashmla compile#78550

[Cpp API Compatibility] Fix flashmla compile#78550
SigureMo merged 5 commits intoPaddlePaddle:developfrom
youge325:fix-flashmla-compile

youge325 commented Apr 1, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

codecov-commenter commented Apr 1, 2026 •

edited

Loading

Uh oh!

SigureMo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		PD_CHECK(false,
		"native_handle() is not supported for this device type (",

	PD_CHECK(false,
	"native_handle() is not supported for this device type (",
	PD_THROW("native_handle() is not supported for this device type (",

Conversation

youge325 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

变更详情

相关文档

是否引起精度变化

Uh oh!

paddle-bot bot commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

youge325 commented Apr 1, 2026 •

edited

Loading

codecov-commenter commented Apr 1, 2026 •

edited

Loading