Conversation
Dedupes concurrent identical requests via shared_future and lets an upstream's retry-after-disconnect hit the cached result instead of recomputing a multi-minute query. - ServerQueryCache: bounded byte-budget LRU; entries with active waiters are never evicted; FAILED-delivery entries get a 2h TTL while DELIVERED entries live until size pressure forces eviction. - /search handler acquires a Handle, computes on miss, reuses on hit; the on_sent callback drives mark_delivered / mark_failed. - New --cache-size flag in GB (default 1; 0 disables). - 10 unit tests covering dedup, eviction, TTL, refcount protection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Two-pass eviction in evict_under_pressure_locked: pass 1 sacrifices
DELIVERED and past-TTL FAILED entries (LRU oldest first); pass 2 falls
back to FAILED-within-TTL only when the cache would otherwise stay
oversize. The failed-delivery TTL is now a retention *floor*, not a
ceiling — a flood of small successful requests can't displace responses
an upstream is still about to retry.
- mark_delivered() is a sink: a duplicate request whose later delivery
fails cannot resurrect a previously-served entry's TTL. Implemented
with a CAS loop in on_delivery.
- release_waiter runs both the TTL sweep and the size-pressure sweep,
so the cache settles back under budget without waiting for the next
insert (the just-inserted entry may have been waiter-protected during
its own eviction pass).
- One cache log line per request when the cache is enabled:
cache HIT / cache STORE with size + occupancy/entries
delivery FAILED with the configured retry-window minutes
- Three new unit tests:
DeliveredIsSinkStateAgainstLaterFailure
FailedWithinTtlPreservedOverDelivered
FailedEntriesEvictedUnderHeavyPressure
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vocabulary aligned with the actual semantics: an entry whose delivery
failed becomes "protected" with priority retention for the configured
window, and graduates to the main cache after the window expires (state
stays PROTECTED but it loses priority and ages out via normal LRU). A
successful retry flips it to DELIVERED.
- DeliveryState::FAILED → DeliveryState::PROTECTED
- Handle::mark_failed → Handle::mark_protected
- failed_ttl → protection_ttl
Sliding retention window: every cache hit on a PROTECTED entry refreshes
ready_at, since the hit itself is direct evidence the upstream is still
retrying. Removed the proactive past-TTL eviction sweep — past-window
PROTECTED entries are kept until normal LRU pressure forces them out
(matches "graduate to main cache").
Two new tests:
ProtectedPastWindowGraduatesToMainCache
ProtectedHitRefreshesPriorityWindow
Class doc, Handle doc, and per-method comments in server_cache.{hpp,cpp}
rewritten for clarity:
* Per-request lifecycle (acquire → compute/get → on_sent → ~Handle)
* State machine (PENDING / DELIVERED-as-sink / PROTECTED with sliding
window)
* Two-pass eviction policy (priority class + bounded fallback)
server.cpp /search handler: top-of-handler comment block describing the
4-step request lifecycle and where the cache plugs in. Inline comments
mark each phase (cache handshake, hit fast path, miss compute & publish,
on_sent delivery reporter).
server_utils.hpp: process_request on_sent doc updated to describe the
PROTECTED-on-failure semantics.
Removed unused common/logger.hpp include from server_cache.cpp.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cache used to hold a Json::Value tree, which (a) carried a lot of allocator overhead so the byte budget was systematically off, and (b) forced re-serialization on every hit. Switch the cache value to the serialized JSON body (a std::string), which makes the byte accounting exact (== response.size()) and lets hits return the body verbatim without any Json::Value round-trip. - Cache value: shared_ptr<const std::string> (dropped the CachedResult wrapper too; the size is just response.size()). - process_request now takes a callback returning the already-serialized body. /search serializes once at cache-publish time; the other endpoints (/align, /column_labels, /stats) wrap their existing Json::Value with an inline Json::writeString call. - Renamed Entry::approx_size_bytes → Entry::size_bytes (it's exact now). - Extracted make_pending_entry() to dedup the entry-creation between the disabled-cache fast path and the miss path. - Tightened comments throughout to match the simpler shape and the current vocabulary (PROTECTED state, sliding window, etc.). Removed stale references to CachedResult and the old proactive TTL eviction. - Renamed the test helper make_result(N) → response_of_size(N). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add /search result cache for the metagraph server
Why
Two real-world failure modes today:
/searchpayload, both threads run the full multi-second-to-multi-minute query.This PR adds a result cache that solves both. A refcount-pinned design ensures an entry is never evicted while a thread is still computing or reading it.
What
ServerQueryCache(src/cli/server_cache.{hpp,cpp}):std::shared_future<ResultPtr>: the first arrival owns thepromise, late arrivals attach to the same future.--cache-size(GB). Entries with active waiters (currently being computed or read) are never evicted.DELIVERED— successfully delivered at least once. Sink state, normal LRU.PROTECTED— delivery failed, kept with priority over DELIVERED for a sliding 2 h window. Every cache hit refreshes the window (the hit is itself evidence the upstream is still retrying). After 2 h with no hits, the entry graduates to the main cache (state stays PROTECTED but it loses priority and ages out via LRU). A successful retry flips it to DELIVERED.query_mode,discovery_fraction,min_exact_match,max_num_nodes_per_seq_char,top_labels,align, plus graph identity). Per-request overrides are read directly from the JSON, mirroringprocess_search_request's resolution. Formatting flags (verbose_output,Accept-Encoding) are deliberately excluded./searchhandler — acquires aHandle, computes on miss, reuses on hit. Theon_sentasync-write callback drivesmark_delivered/mark_protected. Both single-graph and multi-graph paths share the cache layer.process_request— gains an optionalon_sent(error_code)callback routed through the Simple-Web-Serversend(callback)variant.--cache-size <GB>(default 1; 0 disables). Stored asdoubleso0.5etc. work.cache HIT 4.5 KB (occupancy 312.0/1024 MB, 47 entries)cache STORE 12.5 KB (occupancy 324.0/1024 MB, 48 entries)delivery FAILED (Connection reset by peer); cache entry protected for 120 min retry windowMechanism and policy
Per-request flow
Entry state machine
Eviction policy (two-pass size-pressure sweep)
Three retention classes
Concurrent dedup (the shared_future hinge)
The cache stores a
shared_future<shared_ptr<const CachedResult>>. The first arrival owns thepromise; every later arrival just.get()s the future — no extra computation. Result is published once, observed N times.What's in the cache key (and what isn't)
Cache stores semantic
Json::Value; per-request flags that affect only formatting or transport are applied on the way out.Out of scope (follow-ups)
on_senterror_code). A futurePOST /ack {request_id}can drive the samemark_delivered/mark_protectedhooks — the API is symmetric.Test plan
tests/cli/test_server_cache.cpp, 14 cases): miss→hit, concurrent dedup, LRU under size pressure, refcount protection, PROTECTED retention, sliding-window refresh on hit, graduation to main cache past window, DELIVERED-as-sink immunity to later mark_protected, PROTECTED-within-window held over DELIVERED, PROTECTED entries evicted under heavy pressure, disabled cache (--cache-size 0), exception propagation, abandoned-producer.test_apiintegration tests pass (verified allquery_modepaths still distinguish correctly with the resolved-from-JSON cache key)./searchtwice with identical payload → second response is sub-second; kill curl mid-response on a long query → re-issue same request → cache hit.