Skip to content

Destroy unwrapped_vals instead of wrapped_vals#18962

Merged
lucylq merged 1 commit intomainfrom
security31
Apr 21, 2026
Merged

Destroy unwrapped_vals instead of wrapped_vals#18962
lucylq merged 1 commit intomainfrom
security31

Conversation

@lucylq
Copy link
Copy Markdown
Contributor

@lucylq lucylq commented Apr 17, 2026

Destroy unwrapped_vals instead of dereferencing wrapped_vals.

When a kernel uses TensorLists, it calls EValue::toTensorList(). This dereferences wrapped_vals into unwrapped_vals to get the tensor list.

During execution, a (crafted) MoveCall potentially moves an Int into the TensorList. This means wrapped_vals now points to an Int, whereas unwrapped_vals still holds a Tensor.

Instead of calling destructor on the wrapped_vals (ref to tensor), call the destructor on the unwrapped_vals which contain the real tensor.

Vulnerability: During method destruction, the BoxedEvalueList dereferences its stored pointer and attempts to convert the swapped value to a Tensor, causing a type confusion that terminates the process. This results in a denial of service.

Addresses TOB-EXECUTORCH-31.

This PR was authored with the assistance of Claude.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18962

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 3 Unrelated Failures

As of commit 52b7f06 with merge base 75fe8e9 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@lucylq lucylq force-pushed the security31 branch 2 times, most recently from adfc911 to 306d676 Compare April 20, 2026 20:26
@lucylq lucylq changed the title Add type validation in BoxedEvalueList::get() for TOCTOU defense Destroy unwrapped_vals instead of wrapped_vals Apr 20, 2026
@lucylq lucylq marked this pull request as ready for review April 20, 2026 20:34
Copilot AI review requested due to automatic review settings April 20, 2026 20:34
@lucylq lucylq requested a review from JacobSzwejbka as a code owner April 20, 2026 20:34
@lucylq
Copy link
Copy Markdown
Contributor Author

lucylq commented Apr 20, 2026

@claude review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes a denial-of-service type confusion during EValue destruction by ensuring TensorList/ListOptionalTensor element destructors run on the already-unwrapped storage instead of re-dereferencing potentially-mutated wrapped pointers.

Changes:

  • Added BoxedEvalueList::destroy_elements() to destroy unwrapped_vals_ without touching wrapped_vals_.
  • Updated EValue destructor paths for TensorList and ListOptionalTensor to call destroy_elements().
  • Minor formatting adjustments to EVALUE_DEFINE_TO macro specializations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread runtime/core/evalue.h Outdated
Comment on lines +79 to +85
void destroy_elements() const {
for (typename executorch::aten::ArrayRef<T>::size_type i = 0;
i < wrapped_vals_.size();
i++) {
unwrapped_vals_[i].~T();
}
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop bounds use wrapped_vals_.size() while indexing into unwrapped_vals_. If these ever diverge (e.g., partial unwrap, error paths, or refactoring), this becomes an out-of-bounds destructor call. Prefer iterating over the actual unwrapped element count (e.g., unwrapped_size_/unwrapped_vals_size_) or add an explicit invariant check that wrapped_vals_.size() matches the unwrapped storage length before iterating.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine - only wrapped_vals_ size is available, as unwrapped_vals_ is a raw pointer expected to be the same size.

see

  // Source of truth for the list
  executorch::aten::ArrayRef<EValue*> wrapped_vals_;
  // Same size as wrapped_vals
  mutable T* unwrapped_vals_;

Comment thread runtime/core/evalue.h Outdated
Comment on lines +79 to +85
void destroy_elements() const {
for (typename executorch::aten::ArrayRef<T>::size_type i = 0;
i < wrapped_vals_.size();
i++) {
unwrapped_vals_[i].~T();
}
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroy_elements() is const but it explicitly destroys the underlying objects, which makes subsequent use of the list (including calling get()) undefined behavior. Consider making this method non-const and/or renaming to something that clearly signals one-way invalidation (e.g., destroy_unwrapped_elements_for_destruction()), and document that the list must not be accessed after calling it.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment thread runtime/core/evalue.h Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Claude finished @lucylq's task in 2m 29s —— View job


Review of PR #18962 — Destroy unwrapped_vals instead of wrapped_vals

  • Read the changed file and PR diff
  • Analyze the fix for correctness and security
  • Check for edge cases and potential issues
  • Post review feedback

Summary

This PR addresses TOB-EXECUTORCH-31, a denial-of-service vulnerability caused by type confusion during EValue destruction. The core issue: a crafted MoveCall instruction can overwrite an entry in the values_ table (e.g., replacing a Tensor with an Int) after a BoxedEvalueList has stored pointers to that entry. When EValue::destroy() previously called get() on the BoxedEvalueList, it re-dereferenced wrapped_vals_ (the pointers into the now-mutated values_ table), triggering a type mismatch assertion and process termination.

The fix is sound: by adding destroy_elements() that calls destructors directly on unwrapped_vals_ (which still hold valid Tensor objects from the last get() call), destruction no longer touches the potentially-corrupted values_ table.

Correctness Analysis

The fix is correct. Here's the reasoning:

  1. unwrapped_vals_ is populated by get() during kernel execution, which copies/constructs T objects from dereferenced wrapped_vals_. These are real Tensor (or optional<Tensor>) objects that need their destructors called.

  2. After a MoveCall mutates the values_ table, wrapped_vals_ pointers still point to valid EValue slots, but those slots may now hold a different type. Re-dereferencing them (as the old code did via get()) would trigger ET_CHECK_MSG failures in toTensor().

  3. destroy_elements() bypasses this entirely by directly destroying the already-unwrapped objects, which is the correct behavior for cleanup.

Issues and Suggestions

1. Loop bound mismatch (minor concern)evalue.h:80-81

for (typename executorch::aten::ArrayRef<T>::size_type i = 0;
     i < wrapped_vals_.size();  // <-- uses wrapped_vals_ size
     i++) {
  unwrapped_vals_[i].~T();     // <-- indexes into unwrapped_vals_
}

wrapped_vals_ and unwrapped_vals_ are constructed with the same size parameter in the constructor (evalue.h:65-67), and there's no code path that changes either independently, so these sizes are always equal. However, unwrapped_vals_ is a raw T* with no stored size — the size is only tracked via wrapped_vals_.size(). This is an existing design constraint, not something introduced by this PR. The invariant holds, but it might be worth a brief comment noting that wrapped_vals_.size() is used because unwrapped_vals_ has no independent size tracking.

2. const qualifier on destroy_elements()evalue.h:79

The method is marked const, which is technically valid because unwrapped_vals_ is a mutable T* and calling ~T() through it doesn't modify the pointer itself. However, destroy_elements() is a destructive operation that invalidates the list contents. Marking it const is semantically misleading — it suggests the method is safe to call without side effects.

That said, get() is also const for the same mutable reason, and the list is only ever accessed through a const pointer in EValue::destroy() (since payload.copyable_union.as_tensor_list_ptr is not mutable). So making destroy_elements() non-const would require broader changes. The current approach is pragmatically fine.

3. Comment refers to values_ tableevalue.h:77

Copilot flagged this too. The comment says "does not access the values_ table" but values_ is a member of Method, not BoxedEvalueList. From this class's perspective, the relevant point is that it doesn't dereference wrapped_vals_. Consider:

* This is safe to call during EValue destruction because it does not
* dereference wrapped_vals_, which may point to EValues mutated by
* MoveCall instructions.

This is a nit — the current comment conveys the right idea.

4. Double-destruction safety — If destroy_elements() is called but get() was never called (i.e., the kernel never actually used the tensor list), then unwrapped_vals_ contains uninitialized memory and calling ~T() on it is undefined behavior. In practice, the ExecuTorch allocator zero-initializes memory, and Tensor's destructor on a zero-initialized object is safe (it's essentially a no-op on a null TensorImpl*). But this is a latent assumption worth being aware of.

Clarify comment wording →

5. Whitespace-only changes in EVALUE_DEFINE_TO macroevalue.h:523,537

The diff includes two whitespace changes (to<T>()&&to<T>() && and to<T>()&to<T>() &). These are formatting cleanups — correct per style but unrelated to the security fix. Fine to include but worth noting they're cosmetic.

Verdict

The fix is well-targeted and correct. It eliminates the TOCTOU window during destruction without changing the runtime behavior for non-malicious programs. The main areas for potential improvement are the comment wording (nit) and the latent assumption about uninitialized unwrapped_vals_ (low risk in practice).

Looks good to merge with the optional comment wording improvement.


|

Copy link
Copy Markdown
Member

@GregoryComer GregoryComer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have full context on the wrapping logic, but approving to unblock. The two copilot comments seem reasonable, but I'll defer to your judgement.

MoveCall instructions can overwrite values_ entries after parseTensorList
validated their types, creating a time-of-check-time-of-use window.
The existing to<T>() calls have ET_CHECK_MSG type guards that abort on
mismatch, but the error messages don't indicate the TOCTOU cause.

Add explicit type validation with clear diagnostic messages in:
- BoxedEvalueList<optional<Tensor>>::get() (evalue.cpp)
- Improved null check messages in the generic template (evalue.h)

The to<T>() type checks in EValue provide defense in depth for the
generic template (including BoxedEvalueList<Tensor> and <int64_t>).

Note: fully preventing the abort (DoS) would require changing
BoxedEvalueList<T>::get() to return Result<ArrayRef<T>>, which is a
larger API change tracked separately.

Addresses TOB-EXECUTORCH-31.

This PR was authored with the assistance of Claude.
Copilot AI review requested due to automatic review settings April 21, 2026 18:42
@lucylq
Copy link
Copy Markdown
Contributor Author

lucylq commented Apr 21, 2026

I don't have full context on the wrapping logic, but approving to unblock. The two copilot comments seem reasonable, but I'll defer to your judgement.

responded to copilot's comments and made destroy non-const.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread runtime/core/evalue.h
Comment on lines 505 to 509
} else if (
isTensorList() &&
payload.copyable_union.as_tensor_list_ptr != nullptr) {
// for (auto& tensor : toTensorList()) {
for (auto& tensor : payload.copyable_union.as_tensor_list_ptr->get()) {
tensor.~Tensor();
}
payload.copyable_union.as_tensor_list_ptr->destroy_elements();
} else if (
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a regression test for the reported MoveCall/type-confusion scenario: construct an EValue TensorList, call toTensorList()/get() to populate unwrapped_vals_, then mutate one of the underlying wrapped EValues to a non-tensor (e.g., Int) and ensure destroying the list EValue does not crash. This would lock in the security fix and prevent reintroducing the destructor-time dereference path.

Copilot uses AI. Check for mistakes.
@lucylq lucylq merged commit deaf73b into main Apr 21, 2026
174 of 180 checks passed
@lucylq lucylq deleted the security31 branch April 21, 2026 20:55
rascani added a commit that referenced this pull request Apr 28, 2026
…resolution (#19163)

### Summary

The Pico2 firmware link fails with an undefined reference to
BoxedEvalueList<std::optional<Tensor>>::get() from
RegisterCodegenUnboxedKernelsEverything.cpp.obj inside
libportable_ops_lib.a. ld walks the link line once: by the time
--whole-archive pulls the codegen TU's reference into the link,
libexecutorch_core.a (which contains the evalue.cpp specialization) has
already been processed.

Pico2 was green until #18962 (TOB-EXECUTORCH-31 fix), which removed
EValue::~EValue's inline call to BoxedEvalueList::get(). Before that
commit, libexecutorch_core.a had its own use of the symbol that forced
evalue.cpp.o to be pulled early; the link-order bug was always latent
and the destructor change unmasked it.

Move libexecutorch_core.a to after the kernel archives in all four
USE_CMSIS_NN x USE_SELECTIVE_BUILD branches so the codegen TUs can still
resolve their references after --whole-archive.

Authored with Claude.

### Test plan
CI
zeel2104 pushed a commit to zeel2104/executorch that referenced this pull request May 5, 2026
Destroy unwrapped_vals instead of dereferencing wrapped_vals.

When a kernel uses TensorLists, it calls EValue::toTensorList(). This
dereferences wrapped_vals into unwrapped_vals to get the tensor list.

During execution, a (crafted) MoveCall potentially moves an Int into the
TensorList. This means wrapped_vals now points to an Int, whereas
unwrapped_vals still holds a Tensor.

Instead of calling destructor on the wrapped_vals (ref to tensor), call
the destructor on the unwrapped_vals which contain the real tensor.

Vulnerability: During method destruction, the BoxedEvalueList
dereferences its stored pointer and attempts to convert the swapped
value to a Tensor, causing a type confusion that terminates the process.
This results in a denial of service.

Addresses TOB-EXECUTORCH-31.

This PR was authored with the assistance of Claude.

Co-authored-by: Github Executorch <github_executorch@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. security-fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants