Skip to content

feat(oluo): CAS caller-side persistence — persist_snapshot + load_snapshot (ZEB-110)#224

Merged
jenglund merged 7 commits into
mainfrom
worktree-zeb-110-cas-caller
Apr 13, 2026
Merged

feat(oluo): CAS caller-side persistence — persist_snapshot + load_snapshot (ZEB-110)#224
jenglund merged 7 commits into
mainfrom
worktree-zeb-110-cas-caller

Conversation

@jenglund
Copy link
Copy Markdown
Contributor

@jenglund jenglund commented Apr 13, 2026

Summary

  • Adds persist module to harmony-oluo with two stateless free functions that bridge the sans-I/O OluoEngine with the harmony-content CAS layer
  • persist_snapshot() — handles PersistSnapshot actions: DAG-ingests index + metadata into CAS, builds SnapshotManifest, writes local index file for mmap, updates oluo_head.json
  • load_snapshot() — restores engine from CAS on startup: reads head file, fetches manifest, DAG-reassembles data, calls OluoEngine::from_snapshot()
  • Follows harmony-db's persistence patterns (atomic writes, JSON head file, hex-encoded CIDs)
  • 7 new tests: round-trip integration, fresh start, manifest integrity, crash simulation, deterministic CIDs, head file creation, and load from empty dir

Design

  • Spec: docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md
  • Plan: docs/superpowers/plans/2026-04-13-oluo-cas-caller-persistence.md
  • Postcard for CAS manifest, JSON for local head file
  • Crash safety: head file is written last; incomplete persists leave previous snapshot intact
  • SnapshotManifest links to DAG roots for HNSW index bytes and metadata sidecar

Test plan

  • persist_and_load_round_trip — full cycle with real engine, compaction, search result comparison
  • persist_snapshot_creates_head_file — verifies head file and base file creation
  • load_snapshot_returns_none_when_no_head — fresh start behavior
  • manifest_fields_match_persist_payload — CID resolution back to original bytes
  • load_after_head_deleted_returns_none — crash simulation
  • persist_same_data_twice_produces_same_head — deterministic CAS CIDs
  • All 41 harmony-oluo tests pass
  • Workspace compiles clean

🤖 Generated with Claude Code


Note

Medium Risk
Introduces new persistence/restore paths that touch filesystem atomic writes and CAS/DAG serialization, so bugs could prevent recovery or load incorrect snapshots. Core search logic isn’t changed, but startup/compaction workflows are impacted.

Overview
Adds a new persist module in harmony-oluo that lets callers persist OluoEngine compaction snapshots to the harmony-content CAS and restore an engine on startup.

persist_snapshot DAG-ingests index/metadata bytes, writes a postcard SnapshotManifest to CAS, atomically writes oluo_base.bin, and updates oluo_head.json last for crash safety; load_snapshot reads the head, fetches/deserializes the manifest, reassembles blobs from CAS, restores via OluoEngine::from_snapshot, and rewrites the local base file. Includes a focused test suite covering round-trip restore, fresh-start behavior, manifest integrity, crash simulation, and deterministic head output, plus new deps (harmony-content, serde_json, hex, thiserror, tempfile).

Reviewed by Cursor Bugbot for commit 90d1f81. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Added snapshot persistence and restoration to safely store and recover system state.
    • Added structured manifests for snapshot metadata and deterministic manifest IDs.
  • Bug Fixes / Reliability

    • Improved crash-resilience and atomic on-disk writes; handles missing head file gracefully.
    • Enhanced error reporting for manifest and metadata issues.
  • Documentation

    • Updated design spec describing the new snapshot persistence behavior.
  • Tests

    • Added comprehensive round-trip and edge-case tests for persistence and restore.

jenglund and others added 6 commits April 13, 2026 12:21
…or types (ZEB-110)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(ZEB-110)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…store (ZEB-110)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests (ZEB-110)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…condition docs (ZEB-110)

Addresses code review feedback:
- Rename misleading error variant (used for both ser and deser)
- Document that data_dir must exist for persist/load functions
- Update spec to match actual error type variants

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 13, 2026

PR author is in the excluded authors list.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 13, 2026

CodeAnt AI is reviewing your PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Adds a caller-side snapshot persistence module for OluoEngine using a content-addressed store (CAS): new persist module with persist_snapshot and load_snapshot, SnapshotManifest and OluoPersistError types, CAS-backed manifest/index/metadata storage, atomic local head/base writes, and related dependency updates.

Changes

Cohort / File(s) Summary
Dependencies
crates/harmony-oluo/Cargo.toml
Added CAS/serialization/error helper deps: harmony-content (std), serde_json (std), hex, thiserror (std); added dev-dependency tempfile = "3".
Module Exports
crates/harmony-oluo/src/lib.rs
Declared pub mod persist; and re-exported OluoPersistError and SnapshotManifest.
Persistence Logic
crates/harmony-oluo/src/persist.rs
New module implementing persist_snapshot() and load_snapshot(), SnapshotManifest struct (versioned manifest), OluoPersistError enum, atomic local writes helper, CAS DAG-ingest/retrieve flows, and comprehensive tests covering round-trip, absent head, head creation, crash simulation, and deterministic head behavior.
Design Doc
docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md
Updated to reflect thiserror::Error derive on OluoPersistError, split deserialize variants (ManifestSerde, MetadataDeserialize), and changed NotFound to hex-encoded String CID representation.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant PersistAPI as persist_snapshot()
    participant CAS as BookStore (CAS)
    participant FS as File System

    Caller->>PersistAPI: (data_dir, store, index_bytes, metadata_bytes, key_counter, generation)
    PersistAPI->>CAS: DAG-ingest index_bytes
    CAS-->>PersistAPI: index_cid
    PersistAPI->>CAS: DAG-ingest metadata_bytes
    CAS-->>PersistAPI: metadata_cid
    PersistAPI->>PersistAPI: Build SnapshotManifest (version,key_counter,compact_generation,index_cid,metadata_cid)
    PersistAPI->>CAS: Store serialized manifest as CAS book
    CAS-->>PersistAPI: manifest_cid
    PersistAPI->>FS: atomic_write(oluo_base.bin, index_bytes)
    FS-->>PersistAPI: ok
    PersistAPI->>FS: Write oluo_head.json (version + manifest_cid)
    FS-->>PersistAPI: ok
    PersistAPI-->>Caller: Ok((base_path, generation))
Loading
sequenceDiagram
    participant Caller as Caller
    participant LoadAPI as load_snapshot()
    participant FS as File System
    participant CAS as BookStore (CAS)

    Caller->>LoadAPI: (data_dir, store, compact_threshold)
    LoadAPI->>FS: Read oluo_head.json
    alt Head missing
        FS-->>LoadAPI: FileNotFound
        LoadAPI-->>Caller: Ok(None)
    else Head present
        FS-->>LoadAPI: manifest_cid (JSON)
        LoadAPI->>CAS: Fetch manifest by CID
        CAS-->>LoadAPI: SnapshotManifest
        LoadAPI->>CAS: DAG-reassemble index_bytes from index_cid
        CAS-->>LoadAPI: index_bytes
        LoadAPI->>CAS: DAG-reassemble metadata_bytes from metadata_cid
        CAS-->>LoadAPI: metadata_bytes
        LoadAPI->>LoadAPI: Decode metadata -> BTreeMap, OluoEngine::from_snapshot(...)
        LoadAPI->>FS: atomic_write(oluo_base.bin, index_bytes)
        FS-->>LoadAPI: ok
        LoadAPI-->>Caller: Ok(Some((engine, base_path, compact_generation)))
    end
Loading

Estimated Code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

size:XL, Review effort 3/5

Poem

🐰
I dug a tunnel, cached a dream,
Manifests safe in CAS' bright gleam.
Atomic crumbs so nothing breaks,
Snapshots snug for safety's sake.
Hoppity-hop—persistence complete!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding CAS caller-side persistence functions (persist_snapshot and load_snapshot) to the oluo module, which is the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-zeb-110-cas-caller

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add CAS caller-side persistence for OluoEngine snapshots

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Adds persist module with persist_snapshot() and load_snapshot() functions for CAS-backed
  engine persistence
• Implements DAG-based snapshot storage with atomic writes and crash-safe head file updates
• Introduces SnapshotManifest linking to HNSW index and metadata DAG roots
• Includes 7 comprehensive tests covering round-trip, crash simulation, and deterministic CID
  generation
Diagram
flowchart LR
  Engine["OluoEngine<br/>PersistSnapshot action"]
  Persist["persist_snapshot()"]
  DAG["DAG ingest<br/>index + metadata"]
  Manifest["SnapshotManifest<br/>+ CAS book"]
  HeadFile["oluo_head.json<br/>atomic write"]
  
  Load["load_snapshot()"]
  HeadRead["Read head file"]
  ManifestFetch["Fetch manifest<br/>from CAS"]
  DAGReassemble["DAG reassemble<br/>index + metadata"]
  EngineRestore["OluoEngine::<br/>from_snapshot()"]
  RestoredEngine["OluoEngine<br/>ready to use"]
  
  Engine --> Persist
  Persist --> DAG
  DAG --> Manifest
  Manifest --> HeadFile
  
  HeadRead --> ManifestFetch
  ManifestFetch --> DAGReassemble
  DAGReassemble --> EngineRestore
  EngineRestore --> RestoredEngine
Loading

Grey Divider

File Changes

1. crates/harmony-oluo/src/persist.rs ✨ Enhancement +475/-0

Implement CAS persistence with DAG ingest/reassemble

• New module implementing persist_snapshot() and load_snapshot() entry points for snapshot
 persistence
• persist_snapshot() DAG-ingests index and metadata, builds SnapshotManifest, writes local index
 file, updates head file atomically
• load_snapshot() reads head file, fetches manifest from CAS, reassembles data via DAG, restores
 engine via from_snapshot()
• Includes 7 tests: round-trip integration, fresh start, manifest integrity, crash simulation,
 deterministic CIDs, head file creation, load from empty dir
• Helper function atomic_write() ensures crash safety via write-to-temp-then-rename pattern

crates/harmony-oluo/src/persist.rs


2. crates/harmony-oluo/src/lib.rs ✨ Enhancement +2/-0

Export persist module and public types

• Exports new persist module and public types OluoPersistError and SnapshotManifest

crates/harmony-oluo/src/lib.rs


3. crates/harmony-oluo/Cargo.toml Dependencies +5/-0

Add persistence-related dependencies

• Adds harmony-content dependency with std feature for DAG operations
• Adds serde_json, hex, and thiserror dependencies for serialization and error handling
• Adds tempfile dev-dependency for test fixtures

crates/harmony-oluo/Cargo.toml


View more (1)
4. docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md 📝 Documentation +8/-6

Update error types and documentation

• Renames Deserialize error variant to ManifestSerde to reflect both serialization and
 deserialization use
• Splits error handling into separate ManifestSerde and MetadataDeserialize variants for clarity
• Changes NotFound variant to store hex-encoded CID string instead of raw bytes
• Updates error type to use thiserror::Error derive macro
• Clarifies that data_dir must exist as a precondition for both functions

docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 13, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Non-durable atomic writes🐞
Description
persist::atomic_write writes via std::fs::write() and renames without fsync, so a crash/power loss
can leave oluo_head.json or oluo_base.bin empty/corrupt even if the rename succeeded. This breaks
the documented crash-safety expectations of snapshot persistence/loading.
Code

crates/harmony-oluo/src/persist.rs[R200-205]

+/// Atomically write data to a file (write to .tmp, then rename).
+fn atomic_write(path: &Path, data: &[u8]) -> Result<(), OluoPersistError> {
+    let tmp = path.with_extension("tmp");
+    std::fs::write(&tmp, data)?;
+    std::fs::rename(&tmp, path)?;
+    Ok(())
Evidence
atomic_write() does not call sync_all()/fsync on the temp file before rename, so data may only be in
the OS page cache when the function returns success. Other persistence code in this repo uses an
explicit File::sync_all() before rename to make the temp file durable, indicating that durability is
an expected part of the project’s “atomic write” pattern.

crates/harmony-oluo/src/persist.rs[200-205]
crates/harmony-node/src/disk_io.rs[42-58]
crates/harmony-node/src/memo_io.rs[49-64]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`crates/harmony-oluo/src/persist.rs::atomic_write` uses `std::fs::write()` followed by `std::fs::rename()` but never `sync_all()`/fsyncs the temp file. This means a successful return does not guarantee the contents of `oluo_head.json` / `oluo_base.bin` are durable across crashes/power loss.

## Issue Context
Other on-disk persistence helpers in this repo (e.g. CAS book writer, memo writer) use `File::create` + `write_all` + `sync_all` before the `rename`, which is the project’s established durability pattern.

## Fix Focus Areas
- crates/harmony-oluo/src/persist.rs[200-205]

## What to change
- Rework `atomic_write` to:
 1. Create the temp file with `std::fs::File::create`.
 2. `write_all(data)`.
 3. `file.sync_all()?`.
 4. `std::fs::rename(tmp, path)?`.
- (Optional, if desired for robustness) ensure `path.parent()` exists via `create_dir_all` (even if the public API documents `data_dir` exists).
- Keep error mapping via the existing `Io(#[from] std::io::Error)` path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Apr 13, 2026
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 13, 2026

User description

Summary

  • Adds persist module to harmony-oluo with two stateless free functions that bridge the sans-I/O OluoEngine with the harmony-content CAS layer
  • persist_snapshot() — handles PersistSnapshot actions: DAG-ingests index + metadata into CAS, builds SnapshotManifest, writes local index file for mmap, updates oluo_head.json
  • load_snapshot() — restores engine from CAS on startup: reads head file, fetches manifest, DAG-reassembles data, calls OluoEngine::from_snapshot()
  • Follows harmony-db's persistence patterns (atomic writes, JSON head file, hex-encoded CIDs)
  • 7 new tests: round-trip integration, fresh start, manifest integrity, crash simulation, deterministic CIDs, head file creation, and load from empty dir

Design

  • Spec: docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md
  • Plan: docs/superpowers/plans/2026-04-13-oluo-cas-caller-persistence.md
  • Postcard for CAS manifest, JSON for local head file
  • Crash safety: head file is written last; incomplete persists leave previous snapshot intact
  • SnapshotManifest links to DAG roots for HNSW index bytes and metadata sidecar

Test plan

  • persist_and_load_round_trip — full cycle with real engine, compaction, search result comparison
  • persist_snapshot_creates_head_file — verifies head file and base file creation
  • load_snapshot_returns_none_when_no_head — fresh start behavior
  • manifest_fields_match_persist_payload — CID resolution back to original bytes
  • load_after_head_deleted_returns_none — crash simulation
  • persist_same_data_twice_produces_same_head — deterministic CAS CIDs
  • All 41 harmony-oluo tests pass
  • Workspace compiles clean

🤖 Generated with Claude Code


Note

Medium Risk
Adds new persistence/load paths that write local files and read/write CAS manifests, so failures could affect startup/restore correctness and snapshot durability. Logic is covered by new round-trip and crash-safety tests, but it touches data integrity boundaries.

Overview
Adds a new persist module to harmony-oluo that bridges OluoAction::PersistSnapshot to durable storage: persist_snapshot DAG-ingests index/metadata into CAS, writes a postcard SnapshotManifest referenced by a JSON oluo_head.json, and atomically writes oluo_base.bin for mmap.

Adds load_snapshot to restore an OluoEngine on startup by reading the head file, fetching/deserializing the manifest from CAS, reassembling index/metadata from DAG, and rebuilding the engine via OluoEngine::from_snapshot. Includes new tests covering round-trip restore, fresh-start behavior, manifest integrity, crash simulation, and deterministic head output, plus new dependencies (harmony-content, serde_json, hex, thiserror, tempfile).

Reviewed by Cursor Bugbot for commit 7bdf3ec. Bugbot is set up for automated code reviews on this repo. Configure here.


CodeAnt-AI Description

Add snapshot save and restore for Oluo engine state

What Changed

  • The app can now save Oluo snapshots to local storage and load them again on startup, preserving search state after a restart.
  • Saved snapshots include the index, metadata, and generation details, and the local snapshot file is written last so an interrupted save keeps the previous state intact.
  • If no saved snapshot exists, startup now continues normally without error.
  • Added checks for snapshot version mismatches, missing stored data, and unreadable metadata, with clearer failure messages.
  • Added coverage for full save/load round trips, empty startup, crash recovery, manifest integrity, and repeated saves producing the same result.

Impact

✅ Restart-safe search state
✅ Fewer data-loss issues after interrupted saves
✅ Clearer startup behavior on fresh installs

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Comment thread crates/harmony-oluo/src/persist.rs Outdated
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 13, 2026

CodeAnt AI finished reviewing your PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/harmony-oluo/src/persist.rs`:
- Around line 200-205: The atomic_write function currently writes via
std::fs::write and renames without flushing, which can lose data if a crash
occurs between write and rename; change atomic_write to create the temp file
with std::fs::File::create, write the bytes with write_all, call file.sync_all()
(to flush metadata/data), drop the file handle, then perform std::fs::rename to
atomically replace the target; keep error types as OluoPersistError and
propagate errors from create/write/sync/rename.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c945a20-5f57-49ec-bad8-aa817ae5637c

📥 Commits

Reviewing files that changed from the base of the PR and between 8ab6b74 and 7bdf3ec.

📒 Files selected for processing (4)
  • crates/harmony-oluo/Cargo.toml
  • crates/harmony-oluo/src/lib.rs
  • crates/harmony-oluo/src/persist.rs
  • docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot

Comment thread crates/harmony-oluo/src/persist.rs Outdated
…ZEB-110)

Matches the established durability pattern in harmony-node's disk_io.rs:
File::create → write_all → sync_all → rename. Without fsync, data
could be lost on power failure even after a successful rename.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/harmony-oluo/src/persist.rs`:
- Around line 201-209: atomic_write currently uses a fixed tmp name (tmp =
path.with_extension("tmp")) which allows two concurrent writers to collide;
change atomic_write to create a unique temp file in the same directory (e.g.,
use tempfile::Builder or loop with std::fs::OpenOptions::create_new and a
randomized suffix) and write->sync_all->rename from that unique temp to the
final path, retrying on EEXIST and returning errors for other failures; ensure
you still drop the file handle before rename and consider syncing the containing
directory after rename for durability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 275fee44-b1dc-4023-a51d-43933cc23b99

📥 Commits

Reviewing files that changed from the base of the PR and between 7bdf3ec and 90d1f81.

📒 Files selected for processing (1)
  • crates/harmony-oluo/src/persist.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (3)
crates/harmony-oluo/src/persist.rs (3)

78-119: Solid persist flow and crash ordering.

persist_snapshot correctly DAG-ingests payloads, stores a manifest CID in CAS, writes the mmap base, and updates the head file last.


133-198: Load path is consistent and defensive.

Version checks, CID decode/lookup, DAG reassembly, metadata decode, engine restore, and base-file refresh are all wired correctly.


264-478: Test coverage is strong for first-pass persistence behavior.

The suite exercises round-trip restore, missing head handling, manifest consistency, crash-like head deletion, and deterministic head content.

Comment on lines +201 to +209
fn atomic_write(path: &Path, data: &[u8]) -> Result<(), OluoPersistError> {
use std::io::Write;
let tmp = path.with_extension("tmp");
let mut file = std::fs::File::create(&tmp)?;
file.write_all(data)?;
file.sync_all()?;
drop(file);
std::fs::rename(&tmp, path)?;
Ok(())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

atomic_write temp-name collision can corrupt concurrent writes.

Using a fixed *.tmp path is unsafe if two callers write the same destination concurrently. They can race on the same temp inode and break atomicity guarantees.

Proposed fix (unique temp path with create_new retry)
 fn atomic_write(path: &Path, data: &[u8]) -> Result<(), OluoPersistError> {
     use std::io::Write;
-    let tmp = path.with_extension("tmp");
-    let mut file = std::fs::File::create(&tmp)?;
+    use std::fs::OpenOptions;
+
+    let mut n: u32 = 0;
+    let (tmp, mut file) = loop {
+        let candidate = path.with_extension(format!("tmp.{n}"));
+        match OpenOptions::new()
+            .write(true)
+            .create_new(true)
+            .open(&candidate)
+        {
+            Ok(f) => break (candidate, f),
+            Err(e) if e.kind() == std::io::ErrorKind::AlreadyExists => {
+                n = n.wrapping_add(1);
+            }
+            Err(e) => return Err(OluoPersistError::Io(e)),
+        }
+    };
+
     file.write_all(data)?;
     file.sync_all()?;
     drop(file);
     std::fs::rename(&tmp, path)?;
     Ok(())
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/harmony-oluo/src/persist.rs` around lines 201 - 209, atomic_write
currently uses a fixed tmp name (tmp = path.with_extension("tmp")) which allows
two concurrent writers to collide; change atomic_write to create a unique temp
file in the same directory (e.g., use tempfile::Builder or loop with
std::fs::OpenOptions::create_new and a randomized suffix) and
write->sync_all->rename from that unique temp to the final path, retrying on
EEXIST and returning errors for other failures; ensure you still drop the file
handle before rename and consider syncing the containing directory after rename
for durability.

@jenglund jenglund merged commit 137346a into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant