Destroy unwrapped_vals instead of wrapped_vals by lucylq · Pull Request #18962 · pytorch/executorch

lucylq · 2026-04-17T00:50:09Z

Destroy unwrapped_vals instead of dereferencing wrapped_vals.

When a kernel uses TensorLists, it calls EValue::toTensorList(). This dereferences wrapped_vals into unwrapped_vals to get the tensor list.

During execution, a (crafted) MoveCall potentially moves an Int into the TensorList. This means wrapped_vals now points to an Int, whereas unwrapped_vals still holds a Tensor.

Instead of calling destructor on the wrapped_vals (ref to tensor), call the destructor on the unwrapped_vals which contain the real tensor.

Vulnerability: During method destruction, the BoxedEvalueList dereferences its stored pointer and attempts to convert the swapped value to a Tensor, causing a type confusion that terminates the process. This results in a denial of service.

Addresses TOB-EXECUTORCH-31.

This PR was authored with the assistance of Claude.

pytorch-bot · 2026-04-17T00:50:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18962

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 3 Unrelated Failures

As of commit 52b7f06 with merge base 75fe8e9 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-buck-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t 93f30d6e7db78980a065b6ff8c9364b82c05875fc21f3ab5106315848571caef /exec failed with exit code 3
pull / test-qnn-passes-linux / linux-job (gh)
backends/qualcomm/tests/test_passes.py::TestPasses::test_mha_to_sha

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest / macos / macos-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-17T00:50:54Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

lucylq · 2026-04-20T20:34:17Z

@claude review

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes a denial-of-service type confusion during EValue destruction by ensuring TensorList/ListOptionalTensor element destructors run on the already-unwrapped storage instead of re-dereferencing potentially-mutated wrapped pointers.

Changes:

Added BoxedEvalueList::destroy_elements() to destroy unwrapped_vals_ without touching wrapped_vals_.
Updated EValue destructor paths for TensorList and ListOptionalTensor to call destroy_elements().
Minor formatting adjustments to EVALUE_DEFINE_TO macro specializations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-20T20:41:18Z

+  void destroy_elements() const {
+    for (typename executorch::aten::ArrayRef<T>::size_type i = 0;
+         i < wrapped_vals_.size();
+         i++) {
+      unwrapped_vals_[i].~T();
+    }
+  }


The loop bounds use wrapped_vals_.size() while indexing into unwrapped_vals_. If these ever diverge (e.g., partial unwrap, error paths, or refactoring), this becomes an out-of-bounds destructor call. Prefer iterating over the actual unwrapped element count (e.g., unwrapped_size_/unwrapped_vals_size_) or add an explicit invariant check that wrapped_vals_.size() matches the unwrapped storage length before iterating.

I think this is fine - only wrapped_vals_ size is available, as unwrapped_vals_ is a raw pointer expected to be the same size.

see

// Source of truth for the list executorch::aten::ArrayRef<EValue*> wrapped_vals_; // Same size as wrapped_vals mutable T* unwrapped_vals_;

Copilot · 2026-04-20T20:41:18Z

+  void destroy_elements() const {
+    for (typename executorch::aten::ArrayRef<T>::size_type i = 0;
+         i < wrapped_vals_.size();
+         i++) {
+      unwrapped_vals_[i].~T();
+    }
+  }


destroy_elements() is const but it explicitly destroys the underlying objects, which makes subsequent use of the list (including calling get()) undefined behavior. Consider making this method non-const and/or renaming to something that clearly signals one-way invalidation (e.g., destroy_unwrapped_elements_for_destruction()), and document that the list must not be accessed after calling it.

claude · 2026-04-20T20:44:05Z

Claude finished @lucylq's task in 2m 29s —— View job

Review of PR #18962 — Destroy unwrapped_vals instead of wrapped_vals

Read the changed file and PR diff
Analyze the fix for correctness and security
Check for edge cases and potential issues
Post review feedback

Summary

This PR addresses TOB-EXECUTORCH-31, a denial-of-service vulnerability caused by type confusion during EValue destruction. The core issue: a crafted MoveCall instruction can overwrite an entry in the values_ table (e.g., replacing a Tensor with an Int) after a BoxedEvalueList has stored pointers to that entry. When EValue::destroy() previously called get() on the BoxedEvalueList, it re-dereferenced wrapped_vals_ (the pointers into the now-mutated values_ table), triggering a type mismatch assertion and process termination.

The fix is sound: by adding destroy_elements() that calls destructors directly on unwrapped_vals_ (which still hold valid Tensor objects from the last get() call), destruction no longer touches the potentially-corrupted values_ table.

Correctness Analysis

The fix is correct. Here's the reasoning:

unwrapped_vals_ is populated by get() during kernel execution, which copies/constructs T objects from dereferenced wrapped_vals_. These are real Tensor (or optional<Tensor>) objects that need their destructors called.
After a MoveCall mutates the values_ table, wrapped_vals_ pointers still point to valid EValue slots, but those slots may now hold a different type. Re-dereferencing them (as the old code did via get()) would trigger ET_CHECK_MSG failures in toTensor().
destroy_elements() bypasses this entirely by directly destroying the already-unwrapped objects, which is the correct behavior for cleanup.

Issues and Suggestions

1. Loop bound mismatch (minor concern) — evalue.h:80-81

for (typename executorch::aten::ArrayRef<T>::size_type i = 0;
     i < wrapped_vals_.size();  // <-- uses wrapped_vals_ size
     i++) {
  unwrapped_vals_[i].~T();     // <-- indexes into unwrapped_vals_
}

wrapped_vals_ and unwrapped_vals_ are constructed with the same size parameter in the constructor (evalue.h:65-67), and there's no code path that changes either independently, so these sizes are always equal. However, unwrapped_vals_ is a raw T* with no stored size — the size is only tracked via wrapped_vals_.size(). This is an existing design constraint, not something introduced by this PR. The invariant holds, but it might be worth a brief comment noting that wrapped_vals_.size() is used because unwrapped_vals_ has no independent size tracking.

2. const qualifier on destroy_elements() — evalue.h:79

The method is marked const, which is technically valid because unwrapped_vals_ is a mutable T* and calling ~T() through it doesn't modify the pointer itself. However, destroy_elements() is a destructive operation that invalidates the list contents. Marking it const is semantically misleading — it suggests the method is safe to call without side effects.

That said, get() is also const for the same mutable reason, and the list is only ever accessed through a const pointer in EValue::destroy() (since payload.copyable_union.as_tensor_list_ptr is not mutable). So making destroy_elements() non-const would require broader changes. The current approach is pragmatically fine.

3. Comment refers to values_ table — evalue.h:77

Copilot flagged this too. The comment says "does not access the values_ table" but values_ is a member of Method, not BoxedEvalueList. From this class's perspective, the relevant point is that it doesn't dereference wrapped_vals_. Consider:

* This is safe to call during EValue destruction because it does not
* dereference wrapped_vals_, which may point to EValues mutated by
* MoveCall instructions.

This is a nit — the current comment conveys the right idea.

4. Double-destruction safety — If destroy_elements() is called but get() was never called (i.e., the kernel never actually used the tensor list), then unwrapped_vals_ contains uninitialized memory and calling ~T() on it is undefined behavior. In practice, the ExecuTorch allocator zero-initializes memory, and Tensor's destructor on a zero-initialized object is safe (it's essentially a no-op on a null TensorImpl*). But this is a latent assumption worth being aware of.

Clarify comment wording →

5. Whitespace-only changes in EVALUE_DEFINE_TO macro — evalue.h:523,537

The diff includes two whitespace changes (to<T>()&& → to<T>() && and to<T>()& → to<T>() &). These are formatting cleanups — correct per style but unrelated to the security fix. Fine to include but worth noting they're cosmetic.

Verdict

The fix is well-targeted and correct. It eliminates the TOCTOU window during destruction without changing the runtime behavior for non-malicious programs. The main areas for potential improvement are the comment wording (nit) and the latent assumption about uninitialized unwrapped_vals_ (low risk in practice).

Looks good to merge with the optional comment wording improvement.

|

GregoryComer

I don't have full context on the wrapping logic, but approving to unblock. The two copilot comments seem reasonable, but I'll defer to your judgement.

MoveCall instructions can overwrite values_ entries after parseTensorList validated their types, creating a time-of-check-time-of-use window. The existing to<T>() calls have ET_CHECK_MSG type guards that abort on mismatch, but the error messages don't indicate the TOCTOU cause. Add explicit type validation with clear diagnostic messages in: - BoxedEvalueList<optional<Tensor>>::get() (evalue.cpp) - Improved null check messages in the generic template (evalue.h) The to<T>() type checks in EValue provide defense in depth for the generic template (including BoxedEvalueList<Tensor> and <int64_t>). Note: fully preventing the abort (DoS) would require changing BoxedEvalueList<T>::get() to return Result<ArrayRef<T>>, which is a larger API change tracked separately. Addresses TOB-EXECUTORCH-31. This PR was authored with the assistance of Claude.

lucylq · 2026-04-21T18:44:07Z

I don't have full context on the wrapping logic, but approving to unblock. The two copilot comments seem reasonable, but I'll defer to your judgement.

responded to copilot's comments and made destroy non-const.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T18:49:50Z

    } else if (
        isTensorList() &&
        payload.copyable_union.as_tensor_list_ptr != nullptr) {
-      // for (auto& tensor : toTensorList()) {
-      for (auto& tensor : payload.copyable_union.as_tensor_list_ptr->get()) {
-        tensor.~Tensor();
-      }
+      payload.copyable_union.as_tensor_list_ptr->destroy_elements();
    } else if (


Consider adding a regression test for the reported MoveCall/type-confusion scenario: construct an EValue TensorList, call toTensorList()/get() to populate unwrapped_vals_, then mutate one of the underlying wrapped EValues to a non-tensor (e.g., Int) and ensure destroying the list EValue does not crash. This would lock in the security fix and prevent reintroducing the destructor-time dereference path.

…resolution (#19163) ### Summary The Pico2 firmware link fails with an undefined reference to BoxedEvalueList<std::optional<Tensor>>::get() from RegisterCodegenUnboxedKernelsEverything.cpp.obj inside libportable_ops_lib.a. ld walks the link line once: by the time --whole-archive pulls the codegen TU's reference into the link, libexecutorch_core.a (which contains the evalue.cpp specialization) has already been processed. Pico2 was green until #18962 (TOB-EXECUTORCH-31 fix), which removed EValue::~EValue's inline call to BoxedEvalueList::get(). Before that commit, libexecutorch_core.a had its own use of the symbol that forced evalue.cpp.o to be pulled early; the link-order bug was always latent and the destructor change unmasked it. Move libexecutorch_core.a to after the kernel archives in all four USE_CMSIS_NN x USE_SELECTIVE_BUILD branches so the codegen TUs can still resolve their references after --whole-archive. Authored with Claude. ### Test plan CI

Destroy unwrapped_vals instead of dereferencing wrapped_vals. When a kernel uses TensorLists, it calls EValue::toTensorList(). This dereferences wrapped_vals into unwrapped_vals to get the tensor list. During execution, a (crafted) MoveCall potentially moves an Int into the TensorList. This means wrapped_vals now points to an Int, whereas unwrapped_vals still holds a Tensor. Instead of calling destructor on the wrapped_vals (ref to tensor), call the destructor on the unwrapped_vals which contain the real tensor. Vulnerability: During method destruction, the BoxedEvalueList dereferences its stored pointer and attempts to convert the swapped value to a Tensor, causing a type confusion that terminates the process. This results in a denial of service. Addresses TOB-EXECUTORCH-31. This PR was authored with the assistance of Claude. Co-authored-by: Github Executorch <github_executorch@arm.com>

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026

lucylq force-pushed the security31 branch 2 times, most recently from adfc911 to 306d676 Compare April 20, 2026 20:26

lucylq changed the title ~~Add type validation in BoxedEvalueList::get() for TOCTOU defense~~ Destroy unwrapped_vals instead of wrapped_vals Apr 20, 2026

lucylq marked this pull request as ready for review April 20, 2026 20:34

Copilot AI review requested due to automatic review settings April 20, 2026 20:34

lucylq requested a review from JacobSzwejbka as a code owner April 20, 2026 20:34

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Copilot started reviewing on behalf of lucylq April 20, 2026 20:59 View session

lucylq force-pushed the security31 branch from 306d676 to 3614378 Compare April 20, 2026 22:03

lucylq added the security-fix label Apr 20, 2026

GregoryComer approved these changes Apr 20, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 21, 2026 18:42

lucylq force-pushed the security31 branch from 3614378 to 52b7f06 Compare April 21, 2026 18:42

Copilot started reviewing on behalf of lucylq April 21, 2026 18:43 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

lucylq merged commit deaf73b into main Apr 21, 2026
174 of 180 checks passed

lucylq deleted the security31 branch April 21, 2026 20:55

rascani mentioned this pull request Apr 27, 2026

pico2: link executorch_core after kernel libs to fix --whole-archive resolution #19163

Merged

Conversation

lucylq commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18962

❌ 2 New Failures, 1 Cancelled Job, 3 Unrelated Failures

Uh oh!

github-actions Bot commented Apr 17, 2026

This PR needs a release notes: label

Uh oh!

lucylq commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #18962 — Destroy unwrapped_vals instead of wrapped_vals

Summary

Correctness Analysis

Issues and Suggestions

Verdict

Uh oh!

GregoryComer left a comment

Choose a reason for hiding this comment

Uh oh!

lucylq commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucylq commented Apr 17, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 17, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Apr 20, 2026 •

edited

Loading