Skip to content

[codex] Defer Deepwell text block S3 cleanup until commit from PR #1#75

Closed
Rokurolize wants to merge 1 commit into
developfrom
fix/pr1-text-block-post-commit-cleanup-20260625
Closed

[codex] Defer Deepwell text block S3 cleanup until commit from PR #1#75
Rokurolize wants to merge 1 commit into
developfrom
fix/pr1-text-block-post-commit-cleanup-20260625

Conversation

@Rokurolize

Copy link
Copy Markdown
Owner

Extracted from the closed PR #1 completion-closure stack as a focused Deepwell text-block consistency slice.

Scope:

  • add request-scoped post-commit actions to ServiceContext
  • run queued post-commit actions only after the JSON-RPC database transaction commits successfully
  • queue stale text-block S3 object deletion from add_blocks and delete_blocks instead of deleting objects before the database transaction outcome is known
  • add integration tests for draining and non-draining post-commit cleanup actions

Adaptation note:

Why this is separated:

  • it is limited to transaction/S3 consistency for Deepwell text blocks
  • it does not include render compatibility, XML-RPC, ListPages, Framerail, WWS/Caddy, or local verifier changes

Validation run locally:

  • git diff --check HEAD
  • source portable Rust toolchain env.sh, then cargo fmt -- --check from deepwell/
  • source portable Rust toolchain env.sh, then cargo check --manifest-path deepwell/Cargo.toml
  • source portable Rust toolchain env.sh, then cargo clippy --manifest-path deepwell/Cargo.toml --all-targets -- -D warnings --allow deprecated
  • source portable Rust toolchain env.sh, then cargo test --manifest-path deepwell/Cargo.toml --test text_block -- --no-capture was attempted; the test binary built, but local execution stopped because DATABASE_URL is not set in this WSL environment

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The PR adds PostCommitActions, a request-scoped queue for deferred S3 text-block deletions, and exposes it through ServiceContext and deepwell::services. The JSON-RPC wrapper now creates a queue per request, injects it into transaction contexts, and runs queued actions after successful commit. TextBlockService::add_blocks and delete_blocks now enqueue S3 object deletions instead of deleting immediately, and new integration tests cover the queued cleanup flow.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 5df3dfde-67e7-4586-a72a-08c91565bd62

📥 Commits

Reviewing files that changed from the base of the PR and between 70bd2fb and ee892fb.

📒 Files selected for processing (5)
  • deepwell/src/api.rs
  • deepwell/src/services/context.rs
  • deepwell/src/services/mod.rs
  • deepwell/src/services/text_block/service.rs
  • deepwell/tests/text_block.rs

Comment on lines +74 to +103
pub async fn run_after_commit(&self, state: &ServerState) {
let actions = {
let mut actions = self
.actions
.lock()
.expect("post-commit actions mutex poisoned");
std::mem::take(&mut *actions)
};

for action in actions {
match action {
PostCommitAction::DeleteTextBlockObjects(filenames) => {
delete_text_block_objects_after_commit(state, filenames).await;
}
}
}
}
}

async fn delete_text_block_objects_after_commit(
state: &ServerState,
filenames: Vec<String>,
) {
let bucket = &state.s3_tblocks_bucket;
for filename in filenames {
if let Err(error) = bucket.delete_object(&filename).await {
warn!("Failed to delete committed stale S3 text block {filename}: {error}");
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🔵 Trivial

Make post-commit cleanup durable if S3 reclamation must be guaranteed.

run_after_commit drains the in-memory queue before awaiting S3 deletes, and delete_text_block_objects_after_commit only logs failures. A process crash, cancellation, or persistent S3 error after DB commit drops the cleanup with no retry, leaving stale objects behind. Consider a durable outbox/cleanup table or background retry path for failed deletes.

Comment on lines +50 to +72
#[tokio::test]
async fn queued_text_block_cleanup_preserves_objects_when_not_drained() {
let runner = TestRunner::setup().await;
let bucket = runner.context().s3_tblocks_bucket();
let filename = format!("test-rollback-text-block-{}", Uuid::new_v4());

bucket
.put_object_with_content_type(&filename, b"rollback text block", "text/plain")
.await
.expect("test text-block object should upload");
assert_object_status(bucket, &filename, 200).await;

let actions = PostCommitActions::default();
actions.delete_text_block_objects([filename.clone()]);
assert_eq!(actions.pending_count(), 1);

assert_object_status(bucket, &filename, 200).await;

bucket
.delete_object(&filename)
.await
.expect("rollback preservation test object should clean up");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy lift

Exercise a real rollback instead of only skipping the drain.

This proves queued actions are inert until run_after_commit, but it would still pass if the JSON-RPC/transaction wrapper accidentally drained after a failed transaction. Add coverage that queues cleanup inside a transaction or RPC call that returns Err, then verifies the object remains after rollback.

@Rokurolize Rokurolize force-pushed the fix/pr1-text-block-post-commit-cleanup-20260625 branch from ee892fb to ba5088a Compare June 25, 2026 07:36

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deepwell/src/services/text_block/service.rs (1)

178-183: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Uploads still occur mid-transaction; rollback leaves stale S3 content.

This PR defers deletions to post-commit, but add_blocks still uploads new objects via put_object_with_content_type inside the transaction. Since keys are derived from page_id/index/block_type (not content), a PutObject overwrites the existing object; if the outer transaction later rolls back, the DB reverts but S3 retains the new content, so committed rows point at mismatched blobs. This is the inverse of the consistency issue the PR targets — confirm whether it's intentionally out of scope.

deepwell/src/api.rs (1)

227-267: 🩺 Stability & Availability | 🟠 Major

Drain post-commit actions in non-RPC contexts too. add_blocks() queues S3 deletions, but deepwell/src/services/job/worker.rs and deepwell/src/database/seeder/mod.rs build plain ServiceContext values and can reach the page-rendering path without any run_after_commit() call, so those deletions never execute there. PageService::delete() is covered by the JSON-RPC wrapper.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: c63898a8-664a-45e3-9d75-1a7d64079f5e

📥 Commits

Reviewing files that changed from the base of the PR and between ee892fb and ba5088a.

📒 Files selected for processing (5)
  • deepwell/src/api.rs
  • deepwell/src/services/context.rs
  • deepwell/src/services/mod.rs
  • deepwell/src/services/text_block/service.rs
  • deepwell/tests/text_block.rs

Comment thread deepwell/src/api.rs
Comment on lines +263 to +265
if result.is_ok() {
post_commit_actions.run_after_commit(&db_state).await;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Post-commit deletes run inline before returning the response.

run_after_commit awaits each S3 delete_object sequentially on the request task, so the client response is held until cleanup finishes (and slow/failing S3 inflates latency). Consider spawning the drain on a background task (tokio::spawn) so cleanup doesn't block the response path.

Comment on lines +74 to +87
async fn assert_object_status(
bucket: &s3::bucket::Bucket,
filename: &str,
expected: u16,
) {
let (_head, status) = bucket
.head_object(filename)
.await
.expect("test text-block object HEAD should complete");
assert_eq!(
status, expected,
"unexpected S3 status for text-block object {filename}",
);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

rust-s3 0.37 Bucket head_object return value for 404 not found object error or status code

💡 Result:

In the rust-s3 crate (version 0.37 and later), the bucket.head_object method returns a Result<(HeadObjectResult, u16), S3Error> [1]. When an object is not found, the S3 service returns a 404 status code, which the crate treats as an error, typically surfacing it as an S3Error::HttpFailWithBody(404, ...) or similar error variant depending on your specific configuration [2][3]. If you specifically need to check for the existence of an object without handling raw S3 errors, it is recommended to use the bucket.object_exists(path) method instead of head_object [3]. This method is designed to return Ok(false) specifically when the server returns a 404 status code, and Ok(true) otherwise [4][3]. Summary of approaches: 1. head_object: Returns an error when the object is missing (e.g., 404 status code) [2][1]. You must catch and inspect the error to identify the 404 status. 2. object_exists: A higher-level helper that abstracts the 404 check, returning Result<bool, S3Error> where false indicates the object was not found [4][3].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant test and the rust-s3 dependency version used by the repo.
git ls-files deepwell/tests/text_block.rs Cargo.toml Cargo.lock | sed -n '1,20p'

printf '\n== deepwell/tests/text_block.rs ==\n'
cat -n deepwell/tests/text_block.rs | sed -n '1,180p'

printf '\n== rust-s3 references ==\n'
rg -n 'rust-s3|s3 =' Cargo.toml Cargo.lock deepwell -g 'Cargo.toml' -g 'Cargo.lock' -g '*.toml' -g '*.lock' -g '*.rs'

Repository: Rokurolize/wikijump

Length of output: 4148


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant test and the rust-s3 dependency version used by the repo.
git ls-files deepwell/tests/text_block.rs Cargo.toml Cargo.lock | sed -n '1,20p'

printf '\n== deepwell/tests/text_block.rs ==\n'
cat -n deepwell/tests/text_block.rs | sed -n '1,180p'

printf '\n== rust-s3 references ==\n'
rg -n 'rust-s3|s3 =' Cargo.toml Cargo.lock deepwell -g 'Cargo.toml' -g 'Cargo.lock' -g '*.toml' -g '*.lock' -g '*.rs'

Repository: Rokurolize/wikijump

Length of output: 4148


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant test and the rust-s3 dependency version used by the repo.
git ls-files deepwell/tests/text_block.rs Cargo.toml Cargo.lock | sed -n '1,20p'

printf '\n== deepwell/tests/text_block.rs ==\n'
cat -n deepwell/tests/text_block.rs | sed -n '1,180p'

printf '\n== rust-s3 references ==\n'
rg -n 'rust-s3|s3 =' Cargo.toml Cargo.lock deepwell -g 'Cargo.toml' -g 'Cargo.lock' -g '*.toml' -g '*.lock' -g '*.rs'

Repository: Rokurolize/wikijump

Length of output: 4148


🏁 Script executed:

#!/bin/bash
set -euo pipefail

git ls-files deepwell/tests/text_block.rs Cargo.toml Cargo.lock | sed -n '1,20p'

printf '\n== deepwell/tests/text_block.rs ==\n'
cat -n deepwell/tests/text_block.rs | sed -n '1,180p'

printf '\n== rust-s3 references ==\n'
rg -n 'rust-s3|s3 =' Cargo.toml Cargo.lock deepwell -g 'Cargo.toml' -g 'Cargo.lock' -g '*.toml' -g '*.lock' -g '*.rs'

Repository: Rokurolize/wikijump

Length of output: 4148


Handle the 404 case explicitly. rust-s3 0.37 returns Err for a missing object on head_object, so .expect(...) will panic before the 404 assertion. Match the error or use bucket.object_exists(...) for the post-delete check.

@Rokurolize

Copy link
Copy Markdown
Owner Author

Closing this draft extraction. CodeRabbit correctly surfaced that deferring deletions alone is not safe on current develop: add_blocks still overwrites fixed S3 keys inside the transaction, so rollback can leave committed DB rows pointing at new blob contents. This needs to be redesigned together with the text-block s3_filename/unique-key work or a broader durable cleanup/outbox path, not landed as an independent small PR.

@Rokurolize Rokurolize closed this Jun 25, 2026
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 40.00000% with 30 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
deepwell/src/services/text_block/service.rs 12.50% 14 Missing ⚠️
deepwell/src/api.rs 0.00% 10 Missing ⚠️
deepwell/src/services/context.rs 75.00% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant