Fix cuda overflow#19487
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19487
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 Pending, 1 Unclassified FailureAs of commit ce6fc1b with merge base 23a91d5 ( UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
e5dfd0f to
df170ce
Compare
|
@claude review |
|
Claude finished @lucylq's task in 1m 6s —— View job Code Review: Fix cuda overflow
Overall: This is a solid security hardening PR. The changes add proper dimension/size validation and replace unchecked arithmetic with overflow-safe operations in the non-contiguous Strengths
Minor observations
VerdictLGTM — the changes are correct, focused, and improve the security posture of the non-contiguous copy path. The new checks are properly ordered (cheapest first) and the overflow detection uses battle-tested compiler builtins via |
There was a problem hiding this comment.
Pull request overview
This PR hardens SlimTensor::copy_ in the AOTI slim runtime by enforcing stricter shape compatibility and making strided offset/byte computations overflow-safe (particularly relevant for CUDA pointer arithmetic).
Changes:
- Added precondition checks that
dim()andsizes()match before copying. - Reworked strided element-wise copy offset computation to use
c10::{mul,add}_overflowsand to compute byte offsets explicitly. - Updated CPU/CUDA memcpy call sites to use the computed byte offsets.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ET_CHECK_MSG( | ||
| src_offset >= 0 && dst_offset >= 0 && | ||
| !c10::mul_overflows( | ||
| static_cast<size_t>(src_offset), | ||
| elem_size, | ||
| &src_byte_offset) && | ||
| !c10::mul_overflows( | ||
| static_cast<size_t>(dst_offset), elem_size, &dst_byte_offset), | ||
| "copy_: byte offset overflow"); |
| ET_CHECK_MSG( | ||
| this->dim() == other.dim(), | ||
| "copy_: dim of tensors must match (%zu vs %zu)", | ||
| this->dim(), | ||
| other.dim()); | ||
| ET_CHECK_MSG( | ||
| this->sizes() == other.sizes(), "copy_: sizes of tensors must match"); |
| int64_t src_term = 0; | ||
| int64_t dst_term = 0; | ||
| ET_CHECK_MSG( | ||
| !c10::mul_overflows(counter[d], other.stride(d), &src_term) && |
There was a problem hiding this comment.
i feel like we only need two overflow calculation, one for src = counter[d] * this->stride(d), the other for dst_offset = dst_offset + src, thoughts?
| */ | ||
| SlimTensor& copy_(const SlimTensor& other) { | ||
| ET_CHECK_MSG( | ||
| this->dim() == other.dim(), |
There was a problem hiding this comment.
im not a huge fan of adding this new limitation. Two tensor should be copyable as long as their numel are the same. Maybe we can remove this assertion?
There was a problem hiding this comment.
I think we'd also need to update the loop below if we want tensors to be copyable with different dim/sizes..
|
thx for adding the overflow check! plz solve the broken cuda cis as well. |
| * @return Reference to this tensor | ||
| */ | ||
| SlimTensor& copy_(const SlimTensor& other) { | ||
| ET_CHECK_MSG( |
There was a problem hiding this comment.
This check also enforces two tensors having same shape -- can we remove it?
| !::c10::mul_overflows( | ||
| static_cast<size_t>(src_offset), | ||
| elem_size, | ||
| &src_byte_offset) && | ||
| !::c10::mul_overflows( | ||
| static_cast<size_t>(dst_offset), elem_size, &dst_byte_offset), |
| ET_CHECK_MSG( | ||
| src_offset >= 0 && dst_offset >= 0 && | ||
| !::c10::mul_overflows( |
| ET_CHECK_MSG( | ||
| this->sizes() == other.sizes(), "copy_: sizes of tensors must match"); | ||
| ET_CHECK_MSG( | ||
| this->numel() == other.numel(), "copy_: numel of tensors must match"); | ||
| ET_CHECK_MSG(this->dtype() == other.dtype(), "copy_: dtype must match"); |
| // Copy elem_size bytes from src to dst | ||
| if (this->device().is_cpu() && other.device().is_cpu()) { | ||
| std::memcpy( | ||
| dst_data + dst_offset * elem_size, | ||
| src_data + src_offset * elem_size, | ||
| elem_size); | ||
| dst_data + dst_byte_offset, src_data + src_byte_offset, elem_size); | ||
| } else if (this->device().is_cuda() || other.device().is_cuda()) { |
| SlimTensor& copy_(const SlimTensor& other) { | ||
| ET_CHECK_MSG( | ||
| this->dim() == other.dim(), | ||
| "copy_: dim of tensors must match (%zu vs %zu)", | ||
| this->dim(), |
|
@claude review the PR |
|
Claude finished @Gasoonjia's task in 1m 49s —— View job Code Review: Fix cuda overflow
Overall: Good security hardening. The overflow-safe arithmetic is correct and uses the right primitives. There are a few open design questions and minor issues worth addressing. Correctness & Security
Design Feedback
Follow-up Items (out of scope but worth tracking)
VerdictThe core security fix (overflow-safe offset arithmetic) is correct and well-implemented. The main discussion point is the |
There was a problem hiding this comment.
Update the doc as well to show that two tensors should have same sizes
And can you do me a favor to remove Currently only supports CPU-to-CPU copy (contiguous tensors only). line? it has out of date loll😛
|
@claude review |
|
Claude finished @lucylq's task in 1m 47s —— View job Code Review: Fix cuda overflow
Overall: The security hardening looks correct and well-structured. The overflow-safe arithmetic uses the right primitives, and the doc comment has been properly updated. The PR addresses the prior review feedback well (stale comment removed, sizes restriction documented). A few remaining items: Looks Good
Remaining Items
Items 1-3 are pre-existing issues and out of scope for this PR, but worth tracking as follow-ups to complete the hardening effort. VerdictLGTM — the core fix is correct and focused. The new checks are ordered cheaply (dim/sizes comparison before the per-element loop) and the overflow detection uses well-tested compiler builtins. The remaining unchecked paths (data_ptr, nbytes, compute_numel) are pre-existing and appropriate for follow-up work. |
Update copy_ method to check
Use overflow-safe arithmetic.