Skip to content

archive: contain extraction writes within the destination root#31585

Open
Jarred-Sumner wants to merge 1 commit into
mainfrom
claude/archive-extract-containment
Open

archive: contain extraction writes within the destination root#31585
Jarred-Sumner wants to merge 1 commit into
mainfrom
claude/archive-extract-containment

Conversation

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

What

Bun.Archive.extract() now resolves each entry's parent directory component-by-component relative to the extraction root before creating the entry, instead of handing the full relative path to a single openat/mkdirat/symlinkat call.

  • src/libarchive/lib.rs: new open_entry_parent helper (unix) that walks an entry's parent components against the destination dirfd with O_RDONLY|O_DIRECTORY|O_NOFOLLOW|O_CLOEXEC, creating missing components with mkdirat (0o755) and re-verifying them. Entries whose parent cannot be resolved to a real directory inside the destination are skipped (not counted), the same way the extractor already skips other unresolvable entries (overlong paths, unsafe symlink targets, created-symlink traversal). Enabled via a new ExtractOptions.nofollow_parents flag.
  • src/runtime/api/Archive.rs: the flag is set for Bun.Archive.extract()'s fast path, and the glob-filtered path uses the same helper directly. bun install, bun create, and standalone-executable extraction keep the flag off and are byte-for-byte unchanged.
  • Windows behavior is unchanged.

This matches what node-tar does when an entry's directory resolves to something other than a regular directory inside the destination: the entry is skipped, the rest of the archive still extracts, and the pre-existing directory entry (e.g. a symlink the user placed there) is left untouched.

Behavior note

Archive entries whose ancestor directory inside the destination is a pre-existing symlink (or any non-directory) are no longer written through that path; they are skipped and not included in the returned count. Entries with normal directory ancestors extract exactly as before.

Verification

  • Two new tests in test/js/bun/archive.test.ts ("path safety" describe): extracting an archive containing pivot/file.txt into a destination that already has pivot -> <outside dir> leaves the outside directory empty, keeps the symlink intact, and still extracts the sibling entries — for both the fast path and { glob: "**" }. Both fail on the previous behavior and pass with this change. Compared against node-tar 6.x, which skips such entries the same way.
  • Full test/js/bun/archive.test.ts suite passes (103 pass / 0 fail).
  • bun install smoke test (tarball extraction path with the flag off) unchanged.
  • bun run rust:check-all: 10/10 targets pass.

@robobun

robobun commented May 29, 2026

Copy link
Copy Markdown
Collaborator
Updated 2:14 PM PT - May 29th, 2026

@Jarred-Sumner, your commit 4dcffd279c2136bb7f691329718f2b7d5dad10d3 passed in Build #59064! 🎉


🧪   To try this PR locally:

bunx bun-pr 31585

That installs a local version of the PR into your bun-31585 executable, so you can run:

bun-31585 --bun

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. install: skip symlink entries in extracted tarballs (matching npm) and harden extraction with O_NOFOLLOW #31481 - Also fixes symlink-based path traversal in archive extraction using per-component openat with O_NOFOLLOW to prevent writes outside the extraction root

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4e3ba818-1cde-43fd-b17a-531f59516d95

📥 Commits

Reviewing files that changed from the base of the PR and between 3a79bd7 and 4dcffd2.

📒 Files selected for processing (3)
  • src/libarchive/lib.rs
  • src/runtime/api/Archive.rs
  • test/js/bun/archive.test.ts

Walkthrough

This PR adds safe parent-directory resolution to archive extraction on Unix, preventing traversal through symlinks when creating entries. A new EntryParent abstraction computes per-entry directory handles and names, replacing direct root-fd and full-path operations. The feature is wired via ExtractOptions::nofollow_parents and integrated into both the main extract_to_dir and the filtered extraction paths. Path-safety tests validate the behavior.

Changes

Archive path safety via parent-directory resolution

Layer / File(s) Summary
EntryParent abstraction and parent resolution
src/libarchive/lib.rs
Introduces EntryParent (Unix-only) struct holding optional directory fd and base offset, plus open_entry_parent(root, path) helper that safely parses path parents, creates missing directories via openat/mkdirat, rejects .. and symlink traversal, and returns Some(EntryParent) with updated fd chain or None on failure.
ExtractOptions configuration and extract_to_dir integration
src/libarchive/lib.rs
ExtractOptions gains nofollow_parents: bool field (default false). In Archiver::extract_to_dir, when enabled on Unix for directory/file/symlink entries, computes entry_parent via open_entry_parent() and skips the entry if parent resolution fails.
Directory creation via parent resolution
src/libarchive/lib.rs
Directory creation refactored to use entry_parent.dir(dir_fd) and entry_parent.entry_name(path_z) for the mkdirat base and relative name instead of passing root fd and full path directly.
Symlink and file extraction via parent resolution
src/libarchive/lib.rs
Symlink and file creation/opening refactored to use per-entry entry_parent.dir() and entry_name() instead of root fd and full path. File cleanup on write failure unlinks via the per-entry directory.
Archive.rs filtered extraction integration
src/runtime/api/Archive.rs
Sets nofollow_parents: true for filtered extraction. Computes entry_parent on Unix, derives entry_dir and entry_name_z from it, and uses those for file operations. Symlink extraction simplified to a single attempt without fallback retry logic. Platform-conditional parent-directory creation added for file extraction.
Path safety validation tests
test/js/bun/archive.test.ts
Imports filesystem helpers and adds two non-Windows path safety tests validating extraction does not follow pre-existing symlinked directories in the destination (both with and without { glob: "**" }), asserting the target directory and symlink pivot remain untouched.

Possibly related PRs

  • oven-sh/bun#31175: Exposes path_traverses_created_symlink and adds conditional O::NOFOLLOW_ANY for regular-file opens to further harden symlink traversal in extract_to_dir.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main security-focused change: containing extraction writes within the destination root by resolving parent directories component-by-component.
Description check ✅ Passed The description comprehensively covers both required template sections: 'What does this PR do?' is thoroughly detailed with implementation changes, behavioral notes, and comparison to node-tar; 'How did you verify your code works?' includes specific test results and verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this is security-sensitive path-traversal hardening in the archive extractor with new fd-walking logic and a behavior change (silently skipping entries under symlinked dirs), and it overlaps with #31481 — worth a human look.

Extended reasoning...

Overview

This PR hardens Bun.Archive.extract() against symlink-based directory escape by adding open_entry_parent() in src/libarchive/lib.rs, which walks each parent path component via openat(O_RDONLY|O_DIRECTORY|O_NOFOLLOW|O_CLOEXEC) (creating missing components with mkdirat) and returns an EntryParent holding the resolved parent dirfd plus the basename offset. A new ExtractOptions.nofollow_parents flag gates it; Bun.Archive.extract() (both the fast path in Archiver::extract_to_disk and the glob-filtered path in extract_to_disk_filtered) enables it, while bun install/bun create/standalone extraction keep it off. Two new unix-only tests verify a pre-existing pivot -> outside symlink is not written through.

Security risks

This is the security-sensitive surface: archive extraction path containment. The change is a hardening, but correctness here is subtle — fd lifecycle (EntryParent::Drop closes the held fd; intermediate fds are closed in the walk loop), TOCTOU between mkdirat and the re-openat, and the interaction with the existing ENOENT/EACCES fallback that still calls dir.make_path_u8(dirname) against the root dir (lib.rs around the mkdirat/symlinkat/openat error arms). When nofollow_parents is on, open_entry_parent should have already materialized the parent so those fallbacks shouldn't fire, but the fallback path itself is not nofollow-guarded. There is also a user-visible behavior change: entries under a pre-existing symlinked directory are now silently skipped rather than written.

Level of scrutiny

High. This is production extraction logic that handles untrusted archive input and governs where bytes land on disk. It introduces new per-entry fd open/close churn, a new Drop impl, and rewires which dirfd + name each mkdirat/symlinkat/openat call uses across two files. Mistakes here are either fd leaks or containment bypasses.

Other factors

  • github-actions flagged this as overlapping with #31481, which appears to address the same vulnerability with the same per-component O_NOFOLLOW approach — a maintainer should decide which lands.
  • The bug-hunting pass found nothing; tests are added and the full archive suite reportedly passes. Windows is intentionally untouched (cfg-gated).
  • Given the security sensitivity, the behavior change, and the duplicate-PR question, I'm deferring rather than approving.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too slow. Can you come up with a faster approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants