[Cpp API Compatibility] Align misc apis#78555
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Pull request overview
This PR continues the C++ API compatibility alignment work by bringing several misc compat-layer APIs and behaviors closer to LibTorch/PyTorch, and updating the corresponding C++ compat tests.
Changes:
- Switch CUDA availability checks in compat tests to
torch::cuda::is_available(). - Expand
c10::cuda::CUDAStream/c10::cuda::CUDAGuardcompat APIs and adjust stream-pool and allocator behaviors to better match PyTorch. - Add missing compat headers/sources (e.g.,
torch/version.h,c10/cuda/CUDAFunctions.cpp) and strengthen sparse tensor test assertions.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/cpp/compat/compat_basic_test.cc | Uses torch::cuda::is_available() for CUDA-gated test sections. |
| test/cpp/compat/c10_storage_test.cc | Aligns CUDA availability checks and updates is_simple_data_ptr expectations to PyTorch semantics. |
| test/cpp/compat/c10_layout_test.cc | Adds extra sparse COO shape/dim assertions. |
| test/cpp/compat/ATen_TensorAccessor_test.cc | Uses torch::cuda::is_available() for CUDA-gated accessor test. |
| paddle/phi/api/include/compat/torch/csrc/api/include/torch/version.h | Introduces LibTorch-style version macros for the compat layer. |
| paddle/phi/api/include/compat/CMakeLists.txt | Adds c10/cuda/CUDAFunctions.cpp to the compat build sources. |
| paddle/phi/api/include/compat/c10/cuda/CUDAStream.h | Adds stream API surface (priority/query/sync/pack/unpack/hash/etc.) and new pool/external stream helpers. |
| paddle/phi/api/include/compat/c10/cuda/CUDAGuard.h | Tracks original/current device to better emulate PyTorch guard semantics. |
| paddle/phi/api/include/compat/c10/cuda/CUDAFunctions.h | Moves device_count/device_synchronize to out-of-line definitions; keeps stream sync helper gated by CUDA/HIP. |
| paddle/phi/api/include/compat/c10/cuda/CUDAFunctions.cpp | Implements device_count() and device_synchronize() out-of-line. |
| paddle/phi/api/include/compat/c10/core/Allocator.h | Aligns is_simple_data_ptr semantics and adds allocator registry/utilities. |
| paddle/phi/api/include/compat/ATen/core/ivalue.h | Adds camelCase API aliases and exposes c10::IValue alias for compatibility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Set the current CUDA stream for the device of the given stream in the | ||
| * calling thread. | ||
| * | ||
| * Implements per-thread, per-device current stream semantics: the change is | ||
| * local to the calling OS thread and does not affect any shared state such as | ||
| * Paddle's GPUContext. Other threads continue to see their own current stream. | ||
| */ | ||
| inline CUDAStream getStreamFromPool(const bool isHighPriority, | ||
| c10::DeviceIndex device_index) { | ||
| return getStreamFromPool(isHighPriority ? -1 : 0, device_index); |
There was a problem hiding this comment.
The doc comment above getStreamFromPool(...) describes "Set the current CUDA stream..." but the function below is a stream-pool accessor. This mismatch is misleading for users and makes the header harder to maintain; update the comment to describe getStreamFromPool (and keep setCurrentCUDAStream documented at its own definition).
| * Paddle's GPUContext. Other threads continue to see their own current stream. | ||
| */ | ||
| inline CUDAStream getStreamFromPool(const bool isHighPriority, | ||
| c10::DeviceIndex device_index) { |
There was a problem hiding this comment.
getStreamFromPool(bool isHighPriority, DeviceIndex ...) no longer has a default device_index. This makes calls like getStreamFromPool(true) either fail to compile or (more dangerously) bind to the getStreamFromPool(int priority, DeviceIndex=-1) overload with priority=1, changing semantics (high-priority requested but low-priority returned). Add device_index = -1 to the bool overload (matching PyTorch) and ensure bool arguments cannot silently resolve to the int overload.
| c10::DeviceIndex device_index) { | |
| c10::DeviceIndex device_index = -1) { |
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #78555 +/- ##
==========================================
Coverage ? 95.55%
==========================================
Files ? 4
Lines ? 45
Branches ? 0
==========================================
Hits ? 43
Misses ? 2
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5f387c4 to
fcdbccf
Compare
fcdbccf to
085d55a
Compare
PR Category
Execute Infrastructure
PR Types
Improvements
Description
拆分自 #78484
杂项 API 对齐:Allocator、CUDAFunctions、CUDAGuard、CUDAStream、version.h 等。
变更详情
1.
c10/core/Allocator.h接口补齐新增内容:
行为修正:
is_simple_data_ptr()语义修正为get() == get_context()(与 PyTorch 一致)2.
c10/cuda/CUDAFunctions.h/cpp重构CUDAFunctions.h中的部分实现移到新文件CUDAFunctions.cpp3.
c10/cuda/CUDAGuard.h接口补齐新增方法:
4.
c10/cuda/CUDAStream.h扩展新增功能:
5. 新增
torch/version.h提供与 PyTorch 版本信息相关的宏,用于第三方库(如 DeepGEMM、FlashMLA)检测 Paddle 兼容层的版本。
6. 测试修复
test/cpp/compat/ATen_TensorAccessor_test.cc: 修复 CUDA 测试条件test/cpp/compat/c10_storage_test.cc: 修复is_simple_data_ptr断言test/cpp/compat/c10_layout_test.cc: 补充 SparseCooTensorInferSize 断言test/cpp/compat/compat_basic_test.cc: 修复 CUDA 测试条件相关文档
是否引起精度变化
否