Skip to content

Sparse file support#2727

Draft
ChanTsune wants to merge 7 commits into
mainfrom
sparse-file-support
Draft

Sparse file support#2727
ChanTsune wants to merge 7 commits into
mainfrom
sparse-file-support

Conversation

@ChanTsune
Copy link
Copy Markdown
Owner

@ChanTsune ChanTsune commented Feb 8, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added sparse file support to archives with --sparse and --no-sparse command-line flags (unstable)
    • Archives now detect and preserve sparse file structure during creation
    • Extraction properly restores sparse file regions and holes
  • Bug Fixes

    • Fixed Windows file timestamp restoration during extraction

- Add ChunkType::SPAR constant for sparse file support
- Create SparseMap and DataRegion structs with from_bytes/to_bytes
- Add sparse_map field to NormalEntry struct
- Parse SPAR chunk in NormalEntry::TryFrom, reject duplicates
- Add sparse_map() accessor method
- Update all NormalEntry From implementations
- Emit SPAR chunk in into_chunks() and chunks_write_in()
- Add sparse_map field and set_sparse_map() to EntryBuilder
- Add validation: data_size must match written bytes
- Update Key Types documentation in lib.rs
- Add round-trip and validation tests
Add --sparse/--no-sparse options to create command for detecting and
preserving sparse files. Sparse files are stored efficiently by only
archiving data regions and recording hole positions in SPAR chunk.

- Add detect_sparse_map() using SEEK_HOLE/SEEK_DATA on Unix
- Add write_sparse_from_path() to write only data regions
- Add restore_sparse_file() to recreate sparse files on extraction
- Options are unstable and require --unstable flag
- Use saturating_add in DataRegion::end() to prevent integer overflow
- Use chunked I/O (64KB) in write_sparse_from_path() to prevent memory
  exhaustion with large regions
- Use u64 for remaining bytes in restore_sparse_file() to prevent
  truncation on 32-bit platforms
- Check lseek return value when restoring file position after detection
- Always validate SparseMap invariants in new(), not just debug builds
Clarify that raw_file_size represents data size (bytes in FDAT), not
logical file size. For sparse files, use sparse_map.logical_size() to
get the original file size.
- Add defensive check for hole_start < data_start in sparse detection
- Use checked arithmetic in SparseMap::data_size() to detect overflow
- Add data size validation with improved error messages in extract
- Add integration tests for all-hole, multi-region, trailing/leading hole patterns
- Fix invalid --sparse flag usage in extract test commands
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 8, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This pull request introduces comprehensive sparse file support to the PNA archive format, enabling detection, preservation, and restoration of sparse files. Changes span CLI flag parsing, archive format extensions (new SPAR chunk type), entry metadata structures, extraction logic, and cross-platform detection utilities.

Changes

Cohort / File(s) Summary
CLI Sparse Flagging & Threading
cli/src/command/create.rs, cli/src/command/stdio.rs, cli/src/command/update.rs
Add --sparse/--no-sparse flags, introduce sparse: bool field to CreationContext and CreateCommand, thread sparse state through creation paths.
CLI Entry Building with Sparse
cli/src/command/core.rs, cli/src/command/append.rs, cli/src/command/core/mtree.rs
Extend CreateOptions with sparse: bool field, implement write_sparse_from_path() for conditional sparse writing, propagate flag through destructuring patterns.
CLI Sparse File Restoration
cli/src/command/extract.rs
Implement restore_sparse_file() helper to write only data regions at correct offsets when sparse map exists, bypass SafeWriter for sparse files, preserve original logical file size.
Sparse File Detection Utilities
cli/src/utils.rs, cli/src/utils/sparse.rs
Add detect_sparse_map() with Unix SEEK_DATA/SEEK_HOLE detection and fallback heuristic; Windows stub returns Ok(None).
Archive Format (Sparse Chunks)
lib/src/chunk/types.rs
Introduce SPAR chunk type constant with documentation; mark as critical with new unit test validating properties.
Entry Sparse Metadata Support
lib/src/entry.rs, lib/src/entry/builder.rs, lib/src/entry/sparse.rs
Add sparse_map: Option<SparseMap> to NormalEntry and EntryBuilder; implement SparseMap and DataRegion types with parsing/serialization; add set_sparse_map() builder method and size-mismatch validation.
Sparse Tests & Documentation
cli/tests/cli/create/sparse.rs, lib/src/lib.rs
Comprehensive test coverage for sparse file round-trips (single/multiple holes, trailing holes, all-hole files); verify block-count preservation and content integrity after extraction.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Detect as detect_sparse_map()
    participant Arch as Archive Builder
    participant Entry as EntryBuilder
    participant Serialize as Serialization
    participant Extract as Extraction
    participant File as Output File

    User->>Detect: File to archive
    Detect->>Detect: SEEK_DATA/SEEK_HOLE or st_blocks heuristic
    Detect-->>Arch: SparseMap or None

    alt Sparse Detected
        Arch->>Entry: set_sparse_map(SparseMap)
        Entry->>Serialize: sparse_map present
        Serialize->>Serialize: Emit SPAR chunk
    else Not Sparse
        Arch->>Entry: sparse_map = None
        Entry->>Serialize: Normal write path
    end

    Serialize-->>User: Archive with SPAR metadata

    User->>Extract: Extract with sparse map
    Extract->>File: restore_sparse_file()
    File->>File: Write data regions at offsets
    File->>File: Set logical size
    File-->>User: Sparse file restored
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

cli, lib

Poem

🐰 Sparse files now dance through archives bright,
Holes detected with SEEK_DATA's light,
Data regions tucked in SPAR chunks tight,
Round-trips perfected—no bytes lost in flight!
From detect to restore, pure delight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Sparse file support' directly summarizes the main objective of the pull request, which introduces comprehensive sparse file handling across CLI and library components.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sparse-file-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ChanTsune ChanTsune marked this pull request as draft February 8, 2026 13:00
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ChanTsune, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial feature for handling sparse files, enabling the archiving tool to store and restore them efficiently. By intelligently identifying and omitting large blocks of zeros, the tool can create smaller archives and improve performance, especially for files common in virtual machine images or databases. The implementation includes new command-line options, a dedicated chunk type for sparse metadata, and robust detection and restoration logic, initially focusing on Unix-like systems.

Highlights

  • Sparse File Support: Introduced the ability to detect and preserve sparse files during archiving and extraction, significantly reducing archive size for files with large zero-filled regions.
  • New CLI Option: Added a --sparse flag to the create command, allowing users to enable sparse file detection. This feature is currently marked as unstable.
  • Efficient Storage: Implemented logic to only store the actual data regions of sparse files, along with metadata describing the 'holes', rather than storing the zero bytes.
  • Automatic Restoration: During extraction, files that were archived with sparse metadata will automatically be restored as sparse files on compatible filesystems.
  • New Chunk Type: A new SPAR chunk type has been defined in the archive format to store sparse file metadata, including logical file size and data region mappings.
  • Unix-Specific Detection: Sparse file detection currently leverages SEEK_HOLE/SEEK_DATA system calls on Unix-like operating systems, with a fallback to st_blocks checks.
Changelog
  • cli/src/command/append.rs
    • Added sparse: false to CreateOptions initialization, disabling sparse detection for append operations by default.
  • cli/src/command/core.rs
    • Imported detect_sparse_map for sparse file detection.
    • Added sparse: bool field to CreateOptions struct.
    • Introduced write_sparse_from_path function to handle writing file data while preserving sparse regions.
    • Modified create_entry to conditionally use write_sparse_from_path based on the sparse option.
  • cli/src/command/core/mtree.rs
    • Updated create_entry_from_mtree to destructure CreateOptions with sparse: _, indicating the sparse flag is not directly used in mtree operations.
  • cli/src/command/create.rs
    • Added --sparse and --no-sparse command-line arguments to enable/disable sparse file detection for the create command.
    • Passed the sparse argument value to CreationContext and CreateOptions.
  • cli/src/command/extract.rs
    • Imported SparseMap from the pna library.
    • Modified restore_entry_file to handle sparse files, applying safe writes only to non-sparse files.
    • Implemented restore_sparse_file function to reconstruct sparse files during extraction by writing data regions and setting logical file length.
  • cli/src/command/stdio.rs
    • Added sparse: false to CreationContext for run_create_archive, run_append, and run_update functions, disabling sparse detection for stdio operations by default.
  • cli/src/command/update.rs
    • Added sparse: false to CreateOptions initialization, disabling sparse detection for update operations by default.
  • cli/src/utils.rs
    • Added a new sparse module.
  • cli/src/utils/sparse.rs
    • New file: Implements detect_sparse_map function for Unix systems using SEEK_HOLE/SEEK_DATA and st_blocks.
    • Provides placeholder implementation for non-Unix systems (returns None).
    • Includes unit tests for sparse detection.
  • cli/tests/cli/create/sparse.rs
    • New file: Added comprehensive integration tests for sparse file round-trip functionality, covering various sparse file patterns (middle hole, all hole, multi-region, leading/trailing holes).
  • lib/src/chunk/types.rs
    • Defined a new SPAR (Sparse file map) chunk type.
    • Added documentation for the SPAR chunk, detailing its purpose and format.
    • Added a test for SPAR chunk properties.
  • lib/src/entry.rs
    • Added sparse module and exported DataRegion and SparseMap.
    • Added sparse_map: Option<SparseMap> field to NormalEntry.
    • Modified NormalEntry::try_from<RawEntry> to parse SPAR chunks and store the SparseMap, handling duplicate SPAR chunks as an error.
    • Updated NormalEntry::write_chunk_in and NormalEntry::into_chunks to include the SPAR chunk when a sparse map is present.
    • Added sparse_map() getter method to NormalEntry.
    • Updated From implementations for NormalEntry conversions to preserve the sparse_map.
    • Added unit tests for SPAR chunk parsing, duplicate detection, and sparse_map preservation during conversions and serialization.
  • lib/src/entry/builder.rs
    • Imported SparseMap.
    • Added sparse_map: Option<SparseMap> field to EntryBuilder.
    • Added set_sparse_map method to EntryBuilder.
    • Implemented validation in EntryBuilder::build to ensure written data size matches sparse_map.data_size() if a sparse map is set.
    • Clarified that raw_file_size for sparse files represents the data size, not the logical size.
    • Added unit tests for EntryBuilder with sparse maps, including all-hole files and size mismatch errors.
  • lib/src/entry/sparse.rs
    • New file: Defined DataRegion (offset, size) and SparseMap (logical_size, regions) structs.
    • Implemented methods for SparseMap including new, logical_size, regions, data_size, is_all_hole.
    • Provided from_bytes and to_bytes methods for SparseMap serialization/deserialization, including robust validation of region invariants.
    • Includes comprehensive unit tests for DataRegion and SparseMap functionality and error handling.
  • lib/src/lib.rs
    • Updated crate-level documentation to include SparseMap and DataRegion.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for sparse files, a significant and well-implemented feature. The changes span from the core library, with the new SPAR chunk type and updated entry structures, to the CLI with new commands and OS-specific detection logic. The implementation is robust, and the addition of extensive integration tests covering various sparse file scenarios is particularly commendable. I have one suggestion to make the code more idiomatic.

Comment thread cli/src/command/core.rs
Comment on lines +829 to +840
const CHUNK_SIZE: usize = 64 * 1024;
let mut buf = vec![0u8; CHUNK_SIZE];
for region in sparse_map.regions() {
file.seek(io::SeekFrom::Start(region.offset()))?;
let mut remaining = region.size();
while remaining > 0 {
let to_read = (remaining as usize).min(CHUNK_SIZE);
file.read_exact(&mut buf[..to_read])?;
entry.write_all(&buf[..to_read])?;
remaining -= to_read as u64;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual chunked reading loop for writing sparse file data regions can be simplified by using std::io::copy with a take adapter. This is more idiomatic and removes the need for manual buffer management, improving readability and maintainability. It's also important to check that the number of copied bytes matches the expected region size to handle cases where the source file might be modified during the operation.

Suggested change
const CHUNK_SIZE: usize = 64 * 1024;
let mut buf = vec![0u8; CHUNK_SIZE];
for region in sparse_map.regions() {
file.seek(io::SeekFrom::Start(region.offset()))?;
let mut remaining = region.size();
while remaining > 0 {
let to_read = (remaining as usize).min(CHUNK_SIZE);
file.read_exact(&mut buf[..to_read])?;
entry.write_all(&buf[..to_read])?;
remaining -= to_read as u64;
}
}
for region in sparse_map.regions() {
file.seek(io::SeekFrom::Start(region.offset()))?;
let mut limited_reader = (&mut file).take(region.size());
let copied = io::copy(&mut limited_reader, entry)?;
if copied != region.size() {
return Err(io::Error::new(
io::ErrorKind::UnexpectedEof,
format!(
"failed to read whole sparse region at offset {}: expected {} bytes, got {}",
region.offset(),
region.size(),
copied
),
));
}
}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@cli/src/command/core.rs`:
- Around line 814-847: The loop in write_sparse_from_path casts remaining (u64)
to usize causing truncation on 32-bit targets; change the to_read calculation to
compute a u64-sized min like let to_read_u64 = remaining.min(CHUNK_SIZE as u64)
and then convert to usize (to_read = to_read_u64 as usize) before slicing buf
and calling file.read_exact and entry.write_all, ensuring you never cast the
full remaining directly and that buf[..to_read] uses the safely converted usize;
update references in write_sparse_from_path (CHUNK_SIZE, remaining, buf,
file.read_exact, entry.write_all) accordingly.

In `@cli/src/command/extract.rs`:
- Around line 919-924: The sparse-file extraction branch skips removing an
existing path before creating the file, which can fail for symlinks or other
edge cases when overwrite/remove_existing is enabled; update the sparse branch
(where sparse_map is Some and you call utils::fs::file_create, item.reader,
restore_sparse_file, restore_timestamps) to mirror the non-sparse branch by
checking remove_existing and calling the same removal helper used elsewhere
(e.g., the utils::fs::remove_* path removal function your codebase uses) on path
prior to calling utils::fs::file_create, then proceed to open the reader and
call restore_sparse_file and restore_timestamps as before.
- Around line 1348-1368: The loop currently converts remaining to usize with
(remaining as usize) which can truncate on 32-bit targets and cause an infinite
loop; change how to compute to_read by comparing remaining with CHUNK_SIZE in
u64 form (e.g. let to_read = std::cmp::min(remaining, CHUNK_SIZE as u64) as
usize) so the cast is safe, then use &mut buf[..to_read] in reader.read_exact
and file.write_all as before; update the loop around remaining/total_read (in
the function using region.size(), region.offset(), CHUNK_SIZE, buf,
reader.read_exact, and file.write_all) accordingly to avoid any u64→usize
truncation.

In `@cli/tests/cli/create/sparse.rs`:
- Around line 6-11: The test module file sparse.rs is present but not declared
in the parent test module, causing it to be excluded; either add a Unix-gated
module declaration in the parent (e.g., add #[cfg(unix)] mod sparse; to
cli/tests/cli/create.rs to include the file that uses
std::os::unix::fs::MetadataExt) or delete sparse.rs if the test is
obsolete—update the parent file to reference the module name sparse so the test
is compiled only on Unix.
🧹 Nitpick comments (7)
cli/src/command/stdio.rs (1)

954-954: Sparse is hardcoded to false across all stdio paths — consider future --sparse flag.

The sparse: false initialization is consistent across run_create_archive (Line 954), run_append (Line 1262), and run_update (Line 1417). However, StdioCommand has no --sparse CLI argument, so users of the stdio/bsdtar-compatible interface cannot opt into sparse support at all.

If this is intentional for incremental rollout, a brief comment explaining why stdio mode doesn't support --sparse yet would help future contributors.

lib/src/chunk/types.rs (1)

333-342: Test comment is slightly misleading on the "Reserved" property.

The comment says Reserved (A=uppercase) which could be misread as "the reserved bit IS set". In reality, uppercase 3rd byte means the reserved bit is clear (valid). The assertion !is_set_reserved() is correct.

📝 Suggested comment clarification
     fn spar_chunk_properties() {
-        // SPAR: Critical (S=uppercase), Public (P=uppercase),
-        //       Reserved (A=uppercase), Unsafe-to-copy (R=uppercase)
+        // SPAR: Critical (S=uppercase), Public (P=uppercase),
+        //       Reserved-bit-clear (A=uppercase), Unsafe-to-copy (R=uppercase)
         assert!(ChunkType::SPAR.is_critical());
cli/tests/cli/create/sparse.rs (1)

14-14: Prefer &Path over &PathBuf in function signatures.

All helper functions take &PathBuf instead of the more idiomatic &Path. Clippy's clippy::ptr_arg lint flags this because &PathBuf auto-derefs to &Path, and accepting the broader type is more flexible.

Example fix for one function (apply same pattern to all five)
-fn create_sparse_file(path: &PathBuf) -> bool {
+fn create_sparse_file(path: &Path) -> bool {

Also applies to: 104-104, 206-206, 346-346, 439-439

cli/src/command/extract.rs (1)

1371-1376: Consider promoting debug_assert_eq! to a hard check.

The debug_assert_eq! on total bytes read vs. expected data size is stripped in release builds. Since a mismatch here indicates archive corruption or a bug in the read loop, a hard error would prevent silent data corruption in production.

Proposed fix
-    debug_assert_eq!(
-        total_read, expected_data_size,
-        "Sparse data size mismatch: read {}, expected {}",
-        total_read, expected_data_size
-    );
+    if total_read != expected_data_size {
+        return Err(io::Error::new(
+            io::ErrorKind::InvalidData,
+            format!(
+                "Sparse data size mismatch: read {} bytes, expected {}",
+                total_read, expected_data_size
+            ),
+        ));
+    }
cli/src/command/core.rs (1)

824-846: Non-sparse files are opened twice when --sparse is enabled.

detect_sparse_map opens and inspects the file, then the fallback at Line 845 calls write_from_path, which opens the same file again. This is minor (one extra open syscall per non-sparse file), but could be avoided by reusing the already-open file handle.

lib/src/entry/sparse.rs (1)

42-49: saturating_add in end() silently caps at u64::MAX on overflow.

If offset + size would exceed u64::MAX, end() returns u64::MAX instead of the true value. This means validate_regions's check last.end() <= logical_size could pass for a region that genuinely exceeds bounds (e.g., offset = u64::MAX - 5, size = 10, logical_size = u64::MAX). In practice this is extremely unlikely, but you may want to add a checked-add validation in validate_regions to reject such edge cases.

cli/src/utils/sparse.rs (1)

119-125: Consider also matching ENOSYS as unsupported.

FUSE filesystems return ENOSYS when they don't support lseek operations for SEEK_HOLE/SEEK_DATA. Without this match, the error would propagate instead of falling back to the st_blocks heuristic. This is a real scenario with filesystem implementations like sshfs and rclone.

Proposed fix
 fn is_seek_hole_unsupported(err: &io::Error) -> bool {
     matches!(
         err.raw_os_error(),
-        Some(libc::EOPNOTSUPP) | Some(libc::EINVAL)
+        Some(libc::EOPNOTSUPP) | Some(libc::EINVAL) | Some(libc::ENOSYS)
     )
 }

Comment thread cli/src/command/core.rs
Comment on lines +814 to +847
/// Writes file data from a path, detecting and preserving sparse regions.
///
/// If the file is sparse, only data regions are written and the sparse map is set on the entry.
/// If the file is not sparse, falls back to normal write behavior.
pub(crate) fn write_sparse_from_path(
entry: &mut EntryBuilder,
path: impl AsRef<Path>,
) -> io::Result<()> {
use io::Seek;

let path = path.as_ref();
let mut file = fs::File::open(path)?;

if let Some(sparse_map) = detect_sparse_map(&file)? {
// Write only data regions using chunked I/O to avoid memory exhaustion
const CHUNK_SIZE: usize = 64 * 1024;
let mut buf = vec![0u8; CHUNK_SIZE];
for region in sparse_map.regions() {
file.seek(io::SeekFrom::Start(region.offset()))?;
let mut remaining = region.size();
while remaining > 0 {
let to_read = (remaining as usize).min(CHUNK_SIZE);
file.read_exact(&mut buf[..to_read])?;
entry.write_all(&buf[..to_read])?;
remaining -= to_read as u64;
}
}
entry.set_sparse_map(sparse_map);
Ok(())
} else {
// Not sparse, use normal write
write_from_path(entry, path)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same u64usize truncation risk as in restore_sparse_file.

Line 835 has the same (remaining as usize).min(CHUNK_SIZE) pattern that can loop infinitely on 32-bit targets when remaining > u32::MAX. Apply the same fix here.

Proposed fix
-                let to_read = (remaining as usize).min(CHUNK_SIZE);
+                let to_read = usize::try_from(remaining).unwrap_or(CHUNK_SIZE).min(CHUNK_SIZE);
🤖 Prompt for AI Agents
In `@cli/src/command/core.rs` around lines 814 - 847, The loop in
write_sparse_from_path casts remaining (u64) to usize causing truncation on
32-bit targets; change the to_read calculation to compute a u64-sized min like
let to_read_u64 = remaining.min(CHUNK_SIZE as u64) and then convert to usize
(to_read = to_read_u64 as usize) before slicing buf and calling file.read_exact
and entry.write_all, ensuring you never cast the full remaining directly and
that buf[..to_read] uses the safely converted usize; update references in
write_sparse_from_path (CHUNK_SIZE, remaining, buf, file.read_exact,
entry.write_all) accordingly.

Comment on lines +919 to +924
} else if let Some(sparse_map) = sparse_map {
// Sparse file restoration - write data regions at correct offsets
let mut file = utils::fs::file_create(&path, remove_existing)?;
let mut reader = item.reader(ReadOptions::with_password(password))?;
restore_sparse_file(&mut file, &mut reader, sparse_map)?;
restore_timestamps(&mut file, item.metadata(), keep_options)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing explicit file removal before creation in the sparse path.

The non-sparse, non-safe-writes branch (Line 926-929) explicitly removes the existing path before calling file_create when remove_existing is true. The sparse branch here skips that step. If file_create doesn't handle all edge cases (e.g., existing symlinks), this could cause extraction failures under --overwrite.

Proposed fix
         } else if let Some(sparse_map) = sparse_map {
             // Sparse file restoration - write data regions at correct offsets
+            if remove_existing {
+                utils::io::ignore_not_found(utils::fs::remove_path(&path))?;
+            }
             let mut file = utils::fs::file_create(&path, remove_existing)?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if let Some(sparse_map) = sparse_map {
// Sparse file restoration - write data regions at correct offsets
let mut file = utils::fs::file_create(&path, remove_existing)?;
let mut reader = item.reader(ReadOptions::with_password(password))?;
restore_sparse_file(&mut file, &mut reader, sparse_map)?;
restore_timestamps(&mut file, item.metadata(), keep_options)?;
} else if let Some(sparse_map) = sparse_map {
// Sparse file restoration - write data regions at correct offsets
if remove_existing {
utils::io::ignore_not_found(utils::fs::remove_path(&path))?;
}
let mut file = utils::fs::file_create(&path, remove_existing)?;
let mut reader = item.reader(ReadOptions::with_password(password))?;
restore_sparse_file(&mut file, &mut reader, sparse_map)?;
restore_timestamps(&mut file, item.metadata(), keep_options)?;
🤖 Prompt for AI Agents
In `@cli/src/command/extract.rs` around lines 919 - 924, The sparse-file
extraction branch skips removing an existing path before creating the file,
which can fail for symlinks or other edge cases when overwrite/remove_existing
is enabled; update the sparse branch (where sparse_map is Some and you call
utils::fs::file_create, item.reader, restore_sparse_file, restore_timestamps) to
mirror the non-sparse branch by checking remove_existing and calling the same
removal helper used elsewhere (e.g., the utils::fs::remove_* path removal
function your codebase uses) on path prior to calling utils::fs::file_create,
then proceed to open the reader and call restore_sparse_file and
restore_timestamps as before.

Comment on lines +1348 to +1368
file.seek(io::SeekFrom::Start(region.offset()))?;
let mut remaining = region.size();
while remaining > 0 {
let to_read = (remaining as usize).min(CHUNK_SIZE);
reader.read_exact(&mut buf[..to_read]).map_err(|e| {
io::Error::new(
e.kind(),
format!(
"Failed to read sparse data at offset {}: {} \
(expected {} bytes total, read {} so far)",
region.offset(),
e,
expected_data_size,
total_read
),
)
})?;
file.write_all(&buf[..to_read])?;
remaining -= to_read as u64;
total_read += to_read as u64;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential infinite loop on 32-bit platforms due to u64usize truncation.

On a 32-bit target, remaining as usize (Line 1351) silently truncates values above u32::MAX. If the truncated result is 0, the loop will never decrease remaining, spinning forever. While 32-bit is unlikely for large sparse regions, this is a correctness gap.

Proposed fix using safe conversion
         while remaining > 0 {
-            let to_read = (remaining as usize).min(CHUNK_SIZE);
+            let to_read = usize::try_from(remaining).unwrap_or(CHUNK_SIZE).min(CHUNK_SIZE);
🤖 Prompt for AI Agents
In `@cli/src/command/extract.rs` around lines 1348 - 1368, The loop currently
converts remaining to usize with (remaining as usize) which can truncate on
32-bit targets and cause an infinite loop; change how to compute to_read by
comparing remaining with CHUNK_SIZE in u64 form (e.g. let to_read =
std::cmp::min(remaining, CHUNK_SIZE as u64) as usize) so the cast is safe, then
use &mut buf[..to_read] in reader.read_exact and file.write_all as before;
update the loop around remaining/total_read (in the function using
region.size(), region.offset(), CHUNK_SIZE, buf, reader.read_exact, and
file.write_all) accordingly to avoid any u64→usize truncation.

Comment on lines +6 to +11
use std::{
fs::{self, File},
io::{Read, Seek, SeekFrom, Write},
os::unix::fs::MetadataExt,
path::PathBuf,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for mod.rs in cli/tests/cli/create to see if sparse is gated
fd -t f 'mod.rs' 'cli/tests/cli/create' --exec cat {}

echo "---"
# Search for mod sparse declarations with context
rg -n 'mod sparse' --type rust 'cli/tests'

Repository: ChanTsune/Portable-Network-Archive

Length of output: 81


🏁 Script executed:

#!/bin/bash
# List files in the directory containing sparse.rs
ls -la cli/tests/cli/create/

echo "---"
# Search for sparse.rs location
fd sparse.rs

echo "---"
# Search for mod sparse declarations more broadly
rg 'mod sparse' --type rust

echo "---"
# Check if sparse.rs exists and its parent structure
fd sparse.rs --exec head -20 {}

Repository: ChanTsune/Portable-Network-Archive

Length of output: 4531


🏁 Script executed:

#!/bin/bash
# Look for mod.rs in parent directories
fd -t f 'mod.rs' 'cli/tests'

echo "---"
# Check if there's a parent file that includes these tests
rg -n 'sparse' 'cli/tests/cli.rs' --type rust

echo "---"
# Search for common module declaration patterns for the tests
rg -n 'mod (atime|ctime|sparse)' 'cli/tests' --type rust -B2 -A2

Repository: ChanTsune/Portable-Network-Archive

Length of output: 497


🏁 Script executed:

#!/bin/bash
# Read cli/tests/cli/create.rs to find sparse module declaration
cat cli/tests/cli/create.rs | head -50

Repository: ChanTsune/Portable-Network-Archive

Length of output: 854


🏁 Script executed:

#!/bin/bash
# Search for sparse module declaration specifically
rg -n 'mod sparse' cli/tests/cli/create.rs --type rust -B1 -A1

Repository: ChanTsune/Portable-Network-Archive

Length of output: 60


🏁 Script executed:

#!/bin/bash
# Get the full list of modules in cli/tests/cli/create.rs
cat cli/tests/cli/create.rs

Repository: ChanTsune/Portable-Network-Archive

Length of output: 854


Remove or add the sparse test module to the parent.

The sparse.rs file exists in cli/tests/cli/create/ but is not declared in cli/tests/cli/create.rs. Either add #[cfg(unix)] mod sparse; to include it (since it requires Unix-only APIs like std::os::unix::fs::MetadataExt), or remove the file if it's no longer needed.

🤖 Prompt for AI Agents
In `@cli/tests/cli/create/sparse.rs` around lines 6 - 11, The test module file
sparse.rs is present but not declared in the parent test module, causing it to
be excluded; either add a Unix-gated module declaration in the parent (e.g., add
#[cfg(unix)] mod sparse; to cli/tests/cli/create.rs to include the file that
uses std::os::unix::fs::MetadataExt) or delete sparse.rs if the test is
obsolete—update the parent file to reference the module name sparse so the test
is compiled only on Unix.

The sparse.rs test file was never compiled or executed because
`mod sparse;` was missing from the parent module create.rs.
@github-actions github-actions Bot added cli This issue is about cli application lib This issue is about lib crate labels Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli This issue is about cli application lib This issue is about lib crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant