feat(oluo): CAS caller-side persistence — persist_snapshot + load_snapshot (ZEB-110)#224
Conversation
…or types (ZEB-110) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(ZEB-110) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…store (ZEB-110) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests (ZEB-110) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…condition docs (ZEB-110) Addresses code review feedback: - Rename misleading error variant (used for both ser and deser) - Document that data_dir must exist for persist/load functions - Update spec to match actual error type variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
PR author is in the excluded authors list. |
|
CodeAnt AI is reviewing your PR. |
📝 WalkthroughWalkthroughAdds a caller-side snapshot persistence module for OluoEngine using a content-addressed store (CAS): new Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant PersistAPI as persist_snapshot()
participant CAS as BookStore (CAS)
participant FS as File System
Caller->>PersistAPI: (data_dir, store, index_bytes, metadata_bytes, key_counter, generation)
PersistAPI->>CAS: DAG-ingest index_bytes
CAS-->>PersistAPI: index_cid
PersistAPI->>CAS: DAG-ingest metadata_bytes
CAS-->>PersistAPI: metadata_cid
PersistAPI->>PersistAPI: Build SnapshotManifest (version,key_counter,compact_generation,index_cid,metadata_cid)
PersistAPI->>CAS: Store serialized manifest as CAS book
CAS-->>PersistAPI: manifest_cid
PersistAPI->>FS: atomic_write(oluo_base.bin, index_bytes)
FS-->>PersistAPI: ok
PersistAPI->>FS: Write oluo_head.json (version + manifest_cid)
FS-->>PersistAPI: ok
PersistAPI-->>Caller: Ok((base_path, generation))
sequenceDiagram
participant Caller as Caller
participant LoadAPI as load_snapshot()
participant FS as File System
participant CAS as BookStore (CAS)
Caller->>LoadAPI: (data_dir, store, compact_threshold)
LoadAPI->>FS: Read oluo_head.json
alt Head missing
FS-->>LoadAPI: FileNotFound
LoadAPI-->>Caller: Ok(None)
else Head present
FS-->>LoadAPI: manifest_cid (JSON)
LoadAPI->>CAS: Fetch manifest by CID
CAS-->>LoadAPI: SnapshotManifest
LoadAPI->>CAS: DAG-reassemble index_bytes from index_cid
CAS-->>LoadAPI: index_bytes
LoadAPI->>CAS: DAG-reassemble metadata_bytes from metadata_cid
CAS-->>LoadAPI: metadata_bytes
LoadAPI->>LoadAPI: Decode metadata -> BTreeMap, OluoEngine::from_snapshot(...)
LoadAPI->>FS: atomic_write(oluo_base.bin, index_bytes)
FS-->>LoadAPI: ok
LoadAPI-->>Caller: Ok(Some((engine, base_path, compact_generation)))
end
Estimated Code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Review Summary by QodoAdd CAS caller-side persistence for OluoEngine snapshots
WalkthroughsDescription• Adds persist module with persist_snapshot() and load_snapshot() functions for CAS-backed engine persistence • Implements DAG-based snapshot storage with atomic writes and crash-safe head file updates • Introduces SnapshotManifest linking to HNSW index and metadata DAG roots • Includes 7 comprehensive tests covering round-trip, crash simulation, and deterministic CID generation Diagramflowchart LR
Engine["OluoEngine<br/>PersistSnapshot action"]
Persist["persist_snapshot()"]
DAG["DAG ingest<br/>index + metadata"]
Manifest["SnapshotManifest<br/>+ CAS book"]
HeadFile["oluo_head.json<br/>atomic write"]
Load["load_snapshot()"]
HeadRead["Read head file"]
ManifestFetch["Fetch manifest<br/>from CAS"]
DAGReassemble["DAG reassemble<br/>index + metadata"]
EngineRestore["OluoEngine::<br/>from_snapshot()"]
RestoredEngine["OluoEngine<br/>ready to use"]
Engine --> Persist
Persist --> DAG
DAG --> Manifest
Manifest --> HeadFile
HeadRead --> ManifestFetch
ManifestFetch --> DAGReassemble
DAGReassemble --> EngineRestore
EngineRestore --> RestoredEngine
File Changes1. crates/harmony-oluo/src/persist.rs
|
Code Review by Qodo
1.
|
User descriptionSummary
Design
Test plan
🤖 Generated with Claude Code Note Medium Risk Overview Adds Reviewed by Cursor Bugbot for commit 7bdf3ec. Bugbot is set up for automated code reviews on this repo. Configure here. CodeAnt-AI DescriptionAdd snapshot save and restore for Oluo engine state What Changed
Impact
💡 Usage GuideChecking Your Pull RequestEvery time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later. Talking to CodeAnt AIGot a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask: This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code. ExamplePreserve Org Learnings with CodeAntYou can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input: This helps CodeAnt AI learn and adapt to your team's coding style and standards. ExampleRetrigger reviewAsk CodeAnt AI to review the PR again, by typing: Check Your Repository HealthTo analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health. |
|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/harmony-oluo/src/persist.rs`:
- Around line 200-205: The atomic_write function currently writes via
std::fs::write and renames without flushing, which can lose data if a crash
occurs between write and rename; change atomic_write to create the temp file
with std::fs::File::create, write the bytes with write_all, call file.sync_all()
(to flush metadata/data), drop the file handle, then perform std::fs::rename to
atomically replace the target; keep error types as OluoPersistError and
propagate errors from create/write/sync/rename.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7c945a20-5f57-49ec-bad8-aa817ae5637c
📒 Files selected for processing (4)
crates/harmony-oluo/Cargo.tomlcrates/harmony-oluo/src/lib.rscrates/harmony-oluo/src/persist.rsdocs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
…ZEB-110) Matches the established durability pattern in harmony-node's disk_io.rs: File::create → write_all → sync_all → rename. Without fsync, data could be lost on power failure even after a successful rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/harmony-oluo/src/persist.rs`:
- Around line 201-209: atomic_write currently uses a fixed tmp name (tmp =
path.with_extension("tmp")) which allows two concurrent writers to collide;
change atomic_write to create a unique temp file in the same directory (e.g.,
use tempfile::Builder or loop with std::fs::OpenOptions::create_new and a
randomized suffix) and write->sync_all->rename from that unique temp to the
final path, retrying on EEXIST and returning errors for other failures; ensure
you still drop the file handle before rename and consider syncing the containing
directory after rename for durability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 275fee44-b1dc-4023-a51d-43933cc23b99
📒 Files selected for processing (1)
crates/harmony-oluo/src/persist.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (3)
crates/harmony-oluo/src/persist.rs (3)
78-119: Solid persist flow and crash ordering.
persist_snapshotcorrectly DAG-ingests payloads, stores a manifest CID in CAS, writes the mmap base, and updates the head file last.
133-198: Load path is consistent and defensive.Version checks, CID decode/lookup, DAG reassembly, metadata decode, engine restore, and base-file refresh are all wired correctly.
264-478: Test coverage is strong for first-pass persistence behavior.The suite exercises round-trip restore, missing head handling, manifest consistency, crash-like head deletion, and deterministic head content.
| fn atomic_write(path: &Path, data: &[u8]) -> Result<(), OluoPersistError> { | ||
| use std::io::Write; | ||
| let tmp = path.with_extension("tmp"); | ||
| let mut file = std::fs::File::create(&tmp)?; | ||
| file.write_all(data)?; | ||
| file.sync_all()?; | ||
| drop(file); | ||
| std::fs::rename(&tmp, path)?; | ||
| Ok(()) |
There was a problem hiding this comment.
atomic_write temp-name collision can corrupt concurrent writes.
Using a fixed *.tmp path is unsafe if two callers write the same destination concurrently. They can race on the same temp inode and break atomicity guarantees.
Proposed fix (unique temp path with create_new retry)
fn atomic_write(path: &Path, data: &[u8]) -> Result<(), OluoPersistError> {
use std::io::Write;
- let tmp = path.with_extension("tmp");
- let mut file = std::fs::File::create(&tmp)?;
+ use std::fs::OpenOptions;
+
+ let mut n: u32 = 0;
+ let (tmp, mut file) = loop {
+ let candidate = path.with_extension(format!("tmp.{n}"));
+ match OpenOptions::new()
+ .write(true)
+ .create_new(true)
+ .open(&candidate)
+ {
+ Ok(f) => break (candidate, f),
+ Err(e) if e.kind() == std::io::ErrorKind::AlreadyExists => {
+ n = n.wrapping_add(1);
+ }
+ Err(e) => return Err(OluoPersistError::Io(e)),
+ }
+ };
+
file.write_all(data)?;
file.sync_all()?;
drop(file);
std::fs::rename(&tmp, path)?;
Ok(())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/harmony-oluo/src/persist.rs` around lines 201 - 209, atomic_write
currently uses a fixed tmp name (tmp = path.with_extension("tmp")) which allows
two concurrent writers to collide; change atomic_write to create a unique temp
file in the same directory (e.g., use tempfile::Builder or loop with
std::fs::OpenOptions::create_new and a randomized suffix) and
write->sync_all->rename from that unique temp to the final path, retrying on
EEXIST and returning errors for other failures; ensure you still drop the file
handle before rename and consider syncing the containing directory after rename
for durability.
Summary
persistmodule toharmony-oluowith two stateless free functions that bridge the sans-I/OOluoEnginewith theharmony-contentCAS layerpersist_snapshot()— handlesPersistSnapshotactions: DAG-ingests index + metadata into CAS, buildsSnapshotManifest, writes local index file for mmap, updatesoluo_head.jsonload_snapshot()— restores engine from CAS on startup: reads head file, fetches manifest, DAG-reassembles data, callsOluoEngine::from_snapshot()Design
docs/superpowers/specs/2026-04-13-oluo-cas-caller-persistence-design.mddocs/superpowers/plans/2026-04-13-oluo-cas-caller-persistence.mdSnapshotManifestlinks to DAG roots for HNSW index bytes and metadata sidecarTest plan
persist_and_load_round_trip— full cycle with real engine, compaction, search result comparisonpersist_snapshot_creates_head_file— verifies head file and base file creationload_snapshot_returns_none_when_no_head— fresh start behaviormanifest_fields_match_persist_payload— CID resolution back to original bytesload_after_head_deleted_returns_none— crash simulationpersist_same_data_twice_produces_same_head— deterministic CAS CIDs🤖 Generated with Claude Code
Note
Medium Risk
Introduces new persistence/restore paths that touch filesystem atomic writes and CAS/DAG serialization, so bugs could prevent recovery or load incorrect snapshots. Core search logic isn’t changed, but startup/compaction workflows are impacted.
Overview
Adds a new
persistmodule inharmony-oluothat lets callers persistOluoEnginecompaction snapshots to theharmony-contentCAS and restore an engine on startup.persist_snapshotDAG-ingests index/metadata bytes, writes a postcardSnapshotManifestto CAS, atomically writesoluo_base.bin, and updatesoluo_head.jsonlast for crash safety;load_snapshotreads the head, fetches/deserializes the manifest, reassembles blobs from CAS, restores viaOluoEngine::from_snapshot, and rewrites the local base file. Includes a focused test suite covering round-trip restore, fresh-start behavior, manifest integrity, crash simulation, and deterministic head output, plus new deps (harmony-content,serde_json,hex,thiserror,tempfile).Reviewed by Cursor Bugbot for commit 90d1f81. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Documentation
Tests