Cache results in the metagraph server by karasikov · Pull Request #626 · ratschlab/metagraph

karasikov · 2026-05-02T23:36:00Z

Add /search result cache for the metagraph server

Why

Two real-world failure modes today:

Concurrent identical requests recompute independently. When two clients submit the same /search payload, both threads run the full multi-second-to-multi-minute query.
A connection drop during a long query forces a full retry. If the upstream API retries after a TCP failure, the server happily redoes the whole computation.

This PR adds a result cache that solves both. A refcount-pinned design ensures an entry is never evicted while a thread is still computing or reading it.

What

ServerQueryCache (src/cli/server_cache.{hpp,cpp}):
- In-flight dedup via std::shared_future<ResultPtr>: the first arrival owns the promise, late arrivals attach to the same future.
- Byte-budgeted LRU: total response size capped by --cache-size (GB). Entries with active waiters (currently being computed or read) are never evicted.
- Two-class retention with priority for entries whose delivery failed:
  - DELIVERED — successfully delivered at least once. Sink state, normal LRU.
  - PROTECTED — delivery failed, kept with priority over DELIVERED for a sliding 2 h window. Every cache hit refreshes the window (the hit is itself evidence the upstream is still retrying). After 2 h with no hits, the entry graduates to the main cache (state stays PROTECTED but it loses priority and ages out via LRU). A successful retry flips it to DELIVERED.
  - Pass 2 of the eviction sweep falls back to LRU on PROTECTED entries when the cache is otherwise full of them, so memory stays bounded.
Cache key — canonicalized request inputs (FASTA hash + query_mode, discovery_fraction, min_exact_match, max_num_nodes_per_seq_char, top_labels, align, plus graph identity). Per-request overrides are read directly from the JSON, mirroring process_search_request's resolution. Formatting flags (verbose_output, Accept-Encoding) are deliberately excluded.
/search handler — acquires a Handle, computes on miss, reuses on hit. The on_sent async-write callback drives mark_delivered / mark_protected. Both single-graph and multi-graph paths share the cache layer.
process_request — gains an optional on_sent(error_code) callback routed through the Simple-Web-Server send(callback) variant.
CLI — --cache-size <GB> (default 1; 0 disables). Stored as double so 0.5 etc. work.
One log line per cached request (only when cache enabled):
- cache HIT 4.5 KB (occupancy 312.0/1024 MB, 47 entries)
- cache STORE 12.5 KB (occupancy 324.0/1024 MB, 48 entries)
- delivery FAILED (Connection reset by peer); cache entry protected for 120 min retry window

Mechanism and policy

Per-request flow

  ┌────────────────────────────────────────────────────────────────────┐
  │ POST /search                                                       │
  └─────────────────────────────┬──────────────────────────────────────┘
                                │
                       key = hash(FASTA) + mode + fractions + graph_id
                                │
                      ┌─────────▼─────────┐
                      │ cache.acquire(key)│  ── waiters++ ──┐
                      └─────────┬─────────┘                 │
                                │                           │
              ┌─────────────────┴────────────────┐          │
              ▼                                  ▼          │
        ┌──────────┐                       ┌─────────────┐  │
        │  MISS    │                       │    HIT      │  │
        │ (first)  │                       │ (in-flight  │  │
        └────┬─────┘                       │  or ready)  │  │
             │                             └──────┬──────┘  │
             │ compute query_fasta()              │         │
             │ (single- or multi-graph)           │         │
             ▼                                    │         │
       handle.set_result(json) ──┐                │         │
       or set_exception(eptr)    │                │         │
                                 ▼                ▼         │
                          ┌──────────────────────────┐      │
                          │  shared_future<ResultPtr>│      │
                          │   .get()  →  Json::Value │      │
                          └──────────────┬───────────┘      │
                                         │                  │
                                  write response            │
                                         │                  │
                            on_sent(ec) callback fires      │
                              │                  │          │
                              ▼                  ▼          │
                      mark_delivered      mark_protected    │
                              │                  │          │
                              └────────┬─────────┘          │
                                       ▼                    │
                                ~Handle()  ── waiters-- ────┘

Entry state machine

                  acquire (miss)
                        │
                        ▼
                   ┌──────────┐
                   │ PENDING  │      computation in flight
                   └─────┬────┘      waiters ≥ 1, never evicted
                         │ set_result(...) / set_exception(...)
                         ▼
                  (response ready)
                         │
                  on_sent(ec) fires
                         │
              ┌──────────┴──────────┐
       ec != 0│                     │ec == 0
              ▼                     ▼
       ┌─────────────┐        ┌──────────┐
       │  PROTECTED  │        │ DELIVERED│   sink state — a later
       │             │        │          │   mark_protected() is
       │ within 2h   │        └──────────┘   a no-op.
       │  of last    │                       Normal LRU.
       │   hit:      │
       │ priority    │       successful retry  ┌──────────┐
       │ over        │ ────────────────────►   │ DELIVERED│
       │ DELIVERED.  │                          └──────────┘
       │             │
       │ past 2h:    │
       │ graduates   │
       │ to main     │
       │ cache       │
       │ (LRU like   │
       │ any other)  │
       └─────────────┘

   each acquire() during PROTECTED refreshes ready_at = now (sliding window)

Eviction policy (two-pass size-pressure sweep)

                  Eviction priority (most → least evictable)
                  ─────────────────────────────────────────────
                    DELIVERED              (LRU oldest first)
                    PROTECTED past 2h      (LRU oldest first)
                  ─── pass 1 boundary ────────────────────────
                    PROTECTED within 2h    (LRU oldest first)   ← held with
                                                                  priority
                                                                  unless cache
                                                                  has only
                                                                  these left
                  ─── pass 2 boundary ────────────────────────
                    waiters > 0            (NEVER evicted)


  ┌────────────────────────────────────────────────────────────────┐
  │ Pass 1: while over budget,                                     │
  │   walk LRU back→front, evict the oldest waiterless entry that  │
  │   is NOT PROTECTED-within-window.                              │
  │                                                                │
  │ Pass 2: still over budget? Walk again, evict any waiterless    │
  │   entry. Bounds memory under heavy PROTECTED pressure.         │
  │                                                                │
  │ Stops when under cap, or no candidate remains.                 │
  │                                                                │
  │ Triggered on insert (set_result) and on waiter release.        │
  │                                                                │
  │ refcount invariant:                                            │
  │   ── producer & readers ⇒ waiters ≥ 1                          │
  │   ── only ~Handle() drops the count                            │
  │   ── in-flight / being-read entries are NEVER evicted          │
  └────────────────────────────────────────────────────────────────┘

Three retention classes

State	Window	Eviction priority
PENDING	none	never (waiters always ≥ 1)
DELIVERED	none	LRU, normal priority
PROTECTED within 2 h	sliding	held over DELIVERED (last resort)
PROTECTED past 2 h	window past	LRU, normal priority

Concurrent dedup (the shared_future hinge)

   Thread A                    Thread B (50 ms later)
   ─────────                   ──────────────────────
   acquire(k)  → MISS          ⋮
   compute_search(...)         acquire(k)  → HIT (same future)
        ⋮                      future.get()  ── BLOCKS ──┐
        ⋮ (12 sec)                                       │
   set_result(json)  ──────────────────────────► both unblock
                                                         │
   future.get() returns        future.get() returns ◄────┘
   (instant)                   (instant)

The cache stores a shared_future<shared_ptr<const CachedResult>>. The first arrival owns the promise; every later arrival just .get()s the future — no extra computation. Result is published once, observed N times.

What's in the cache key (and what isn't)

  IN  the key  ──►  FASTA-body hash, query_mode (resolved from JSON
                    flags: query_coords, query_counts, abundance_sum,
                    with_signature), discovery_fraction,
                    min_exact_match, max_num_nodes_per_seq_char,
                    top_labels, align, graph identity (path or
                    sorted multi-graph list).

  OUT of key  ──►  verbose_output      (server-startup constant)
                   Accept-Encoding     (compression is post-cache)
                   request_id          (different requests, same answer)

Cache stores semantic Json::Value; per-request flags that affect only formatting or transport are applied on the way out.

Out of scope (follow-ups)

Application-level handshake. Today the protection trigger is TCP-level (the on_sent error_code). A future POST /ack {request_id} can drive the same mark_delivered / mark_protected hooks — the API is symmetric.
Cache invalidation on graph reload. Graph identity is part of the key, so a swapped graph file just gets a different namespace; explicit invalidation will be added if/when graph hot-reload becomes a real workflow.

Test plan

Unit tests (tests/cli/test_server_cache.cpp, 14 cases): miss→hit, concurrent dedup, LRU under size pressure, refcount protection, PROTECTED retention, sliding-window refresh on hit, graduation to main cache past window, DELIVERED-as-sink immunity to later mark_protected, PROTECTED-within-window held over DELIVERED, PROTECTED entries evicted under heavy pressure, disabled cache (--cache-size 0), exception propagation, abandoned-producer.
All 3,653+ unit tests pass.
All 171 test_api integration tests pass (verified all query_mode paths still distinguish correctly with the resolved-from-JSON cache key).
All 3,594 integration tests pass.
Manual integration test: hit /search twice with identical payload → second response is sub-second; kill curl mid-response on a long query → re-issue same request → cache hit.

Dedupes concurrent identical requests via shared_future and lets an upstream's retry-after-disconnect hit the cached result instead of recomputing a multi-minute query. - ServerQueryCache: bounded byte-budget LRU; entries with active waiters are never evicted; FAILED-delivery entries get a 2h TTL while DELIVERED entries live until size pressure forces eviction. - /search handler acquires a Handle, computes on miss, reuses on hit; the on_sent callback drives mark_delivered / mark_failed. - New --cache-size flag in GB (default 1; 0 disables). - 10 unit tests covering dedup, eviction, TTL, refcount protection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Two-pass eviction in evict_under_pressure_locked: pass 1 sacrifices DELIVERED and past-TTL FAILED entries (LRU oldest first); pass 2 falls back to FAILED-within-TTL only when the cache would otherwise stay oversize. The failed-delivery TTL is now a retention *floor*, not a ceiling — a flood of small successful requests can't displace responses an upstream is still about to retry. - mark_delivered() is a sink: a duplicate request whose later delivery fails cannot resurrect a previously-served entry's TTL. Implemented with a CAS loop in on_delivery. - release_waiter runs both the TTL sweep and the size-pressure sweep, so the cache settles back under budget without waiting for the next insert (the just-inserted entry may have been waiter-protected during its own eviction pass). - One cache log line per request when the cache is enabled: cache HIT / cache STORE with size + occupancy/entries delivery FAILED with the configured retry-window minutes - Three new unit tests: DeliveredIsSinkStateAgainstLaterFailure FailedWithinTtlPreservedOverDelivered FailedEntriesEvictedUnderHeavyPressure Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Vocabulary aligned with the actual semantics: an entry whose delivery failed becomes "protected" with priority retention for the configured window, and graduates to the main cache after the window expires (state stays PROTECTED but it loses priority and ages out via normal LRU). A successful retry flips it to DELIVERED. - DeliveryState::FAILED → DeliveryState::PROTECTED - Handle::mark_failed → Handle::mark_protected - failed_ttl → protection_ttl Sliding retention window: every cache hit on a PROTECTED entry refreshes ready_at, since the hit itself is direct evidence the upstream is still retrying. Removed the proactive past-TTL eviction sweep — past-window PROTECTED entries are kept until normal LRU pressure forces them out (matches "graduate to main cache"). Two new tests: ProtectedPastWindowGraduatesToMainCache ProtectedHitRefreshesPriorityWindow Class doc, Handle doc, and per-method comments in server_cache.{hpp,cpp} rewritten for clarity: * Per-request lifecycle (acquire → compute/get → on_sent → ~Handle) * State machine (PENDING / DELIVERED-as-sink / PROTECTED with sliding window) * Two-pass eviction policy (priority class + bounded fallback) server.cpp /search handler: top-of-handler comment block describing the 4-step request lifecycle and where the cache plugs in. Inline comments mark each phase (cache handshake, hit fast path, miss compute & publish, on_sent delivery reporter). server_utils.hpp: process_request on_sent doc updated to describe the PROTECTED-on-failure semantics. Removed unused common/logger.hpp include from server_cache.cpp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The cache used to hold a Json::Value tree, which (a) carried a lot of allocator overhead so the byte budget was systematically off, and (b) forced re-serialization on every hit. Switch the cache value to the serialized JSON body (a std::string), which makes the byte accounting exact (== response.size()) and lets hits return the body verbatim without any Json::Value round-trip. - Cache value: shared_ptr<const std::string> (dropped the CachedResult wrapper too; the size is just response.size()). - process_request now takes a callback returning the already-serialized body. /search serializes once at cache-publish time; the other endpoints (/align, /column_labels, /stats) wrap their existing Json::Value with an inline Json::writeString call. - Renamed Entry::approx_size_bytes → Entry::size_bytes (it's exact now). - Extracted make_pending_entry() to dedup the entry-creation between the disabled-cache fast path and the miss path. - Tightened comments throughout to match the simpler shape and the current vocabulary (PROTECTED state, sliding window, etc.). Removed stale references to CachedResult and the old proactive TTL eviction. - Renamed the test helper make_result(N) → response_of_size(N). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

karasikov and others added 7 commits May 3, 2026 01:30

Merge remote-tracking branch 'origin/master' into mk/cache

8deeee8

Merge branch 'master' into mk/cache

34eb8bc

Merge branch 'master' into mk/cache

4f62bbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache results in the metagraph server#626

Cache results in the metagraph server#626
karasikov wants to merge 7 commits into
masterfrom
mk/cache

karasikov commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

karasikov commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add /search result cache for the metagraph server

Why

What

Mechanism and policy

Per-request flow

Entry state machine

Eviction policy (two-pass size-pressure sweep)

Three retention classes

Concurrent dedup (the shared_future hinge)

What's in the cache key (and what isn't)

Out of scope (follow-ups)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

karasikov commented May 2, 2026 •

edited

Loading