Skip to content

fix(recent-block-cache): Reduce fetch fan out and preserve negative cache entries.#4421

Open
metalurgical wants to merge 4 commits into
cowprotocol:mainfrom
metalurgical:block_cache_fix
Open

fix(recent-block-cache): Reduce fetch fan out and preserve negative cache entries.#4421
metalurgical wants to merge 4 commits into
cowprotocol:mainfrom
metalurgical:block_cache_fix

Conversation

@metalurgical
Copy link
Copy Markdown
Contributor

Description

Reduce fetch fan out and preserve negative cache entries.

Changes

  • Fetch cache miss chunks through fetch_values(keys, block) directly instead of expanding them into individual key requests
  • Preserve negative cache entries instead of removing empty entries
  • Add test for negative cache entries

How to test

cargo nextest run

…che entries

Use fetch_values instead of join_all to reduce fan out. Preserve negative cache entries instead of removing empty entries.
@metalurgical metalurgical requested a review from a team as a code owner May 17, 2026 11:27
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the RecentBlockCache by removing the request_sharing dependency and simplifying the fetching logic into a single retry loop within fetch_inner_many. It also corrects a bug in the garbage collection process where negative cache entries (empty results) were being prematurely evicted, and adds a regression test to verify their persistence. Review feedback identified that the new retry implementation introduces unnecessary latency on the final failure, loses original error context by using anyhow::bail!, and violates repository guidelines regarding detailed error reporting for batch fetches.

Comment on lines +246 to +253
for _ in 0..=self.maximum_retries {
match self.fetcher.fetch_values(keys.clone(), block).await {
Ok(values) => return Ok(values),
Err(err) => tracing::warn!("retrying fetch because error: {:?}", err),
}
.boxed()
});
shared.await.context("could not fetch liquidity")
tokio::time::sleep(self.delay_between_retries).await;
}
anyhow::bail!("could not fetch liquidity");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current retry logic in fetch_inner_many has several issues:

  1. Unnecessary Latency: It performs a tokio::time::sleep even after the final retry attempt fails, adding redundant delay to the overall request time.
  2. Error Context Loss: By using anyhow::bail!, the original RPC error (e.g., timeout, node error) is discarded. Using .context() preserves the root cause for better troubleshooting.
  3. Batch Error Detail: According to repository guidelines for batch fetches, the implementation should explicitly indicate which items failed and provide error details for each, rather than returning a single error for the entire batch. Please ensure the response structure reflects individual item statuses.
        let mut keys = keys;
        for i in 0..=self.maximum_retries {
            let current_keys = if i == self.maximum_retries {
                keys
            } else {
                keys.clone()
            };
            match self.fetcher.fetch_values(current_keys, block).await {
                Ok(values) => return Ok(values),
                Err(err) => {
                    if i == self.maximum_retries {
                        return Err(anyhow::Error::from(err).context("could not fetch liquidity"));
                    }
                    tracing::warn!(?err, "retrying fetch");
                    tokio::time::sleep(self.delay_between_retries).await;
                }
            }
        }
        unreachable!()
References
  1. When fetching a batch of items where individual fetches can fail, do not silently ignore errors. The API response should explicitly indicate which items failed and provide error details for each failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point 1 and 2 have been corrected.

Point 3 is a trait redesign, Result<Vec<V>> will need to become HashMap<K, Result<Vec<V>> in multiple places for this.

@metalurgical metalurgical marked this pull request as draft May 17, 2026 11:46
@metalurgical metalurgical marked this pull request as ready for review May 17, 2026 12:20
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the RecentBlockCache to implement batch fetching and simplified retry logic, removing the dependency on individual request sharing. It also fixes a bug in the cache's garbage collection logic that caused negative cache entries to be prematurely evicted and adds a regression test. A high-severity recommendation was made to include an early return for empty key sets in fetch_inner_many to avoid unnecessary RPC calls.

Comment thread crates/liquidity-sources/src/recent_block_cache.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant