Skip to content

Cache results in the metagraph server#626

Open
karasikov wants to merge 7 commits into
masterfrom
mk/cache
Open

Cache results in the metagraph server#626
karasikov wants to merge 7 commits into
masterfrom
mk/cache

Conversation

@karasikov

@karasikov karasikov commented May 2, 2026

Copy link
Copy Markdown
Member

Add /search result cache for the metagraph server

Why

Two real-world failure modes today:

  1. Concurrent identical requests recompute independently. When two clients submit the same /search payload, both threads run the full multi-second-to-multi-minute query.
  2. A connection drop during a long query forces a full retry. If the upstream API retries after a TCP failure, the server happily redoes the whole computation.

This PR adds a result cache that solves both. A refcount-pinned design ensures an entry is never evicted while a thread is still computing or reading it.

What

  • ServerQueryCache (src/cli/server_cache.{hpp,cpp}):
    • In-flight dedup via std::shared_future<ResultPtr>: the first arrival owns the promise, late arrivals attach to the same future.
    • Byte-budgeted LRU: total response size capped by --cache-size (GB). Entries with active waiters (currently being computed or read) are never evicted.
    • Two-class retention with priority for entries whose delivery failed:
      • DELIVERED — successfully delivered at least once. Sink state, normal LRU.
      • PROTECTED — delivery failed, kept with priority over DELIVERED for a sliding 2 h window. Every cache hit refreshes the window (the hit is itself evidence the upstream is still retrying). After 2 h with no hits, the entry graduates to the main cache (state stays PROTECTED but it loses priority and ages out via LRU). A successful retry flips it to DELIVERED.
      • Pass 2 of the eviction sweep falls back to LRU on PROTECTED entries when the cache is otherwise full of them, so memory stays bounded.
  • Cache key — canonicalized request inputs (FASTA hash + query_mode, discovery_fraction, min_exact_match, max_num_nodes_per_seq_char, top_labels, align, plus graph identity). Per-request overrides are read directly from the JSON, mirroring process_search_request's resolution. Formatting flags (verbose_output, Accept-Encoding) are deliberately excluded.
  • /search handler — acquires a Handle, computes on miss, reuses on hit. The on_sent async-write callback drives mark_delivered / mark_protected. Both single-graph and multi-graph paths share the cache layer.
  • process_request — gains an optional on_sent(error_code) callback routed through the Simple-Web-Server send(callback) variant.
  • CLI--cache-size <GB> (default 1; 0 disables). Stored as double so 0.5 etc. work.
  • One log line per cached request (only when cache enabled):
    • cache HIT 4.5 KB (occupancy 312.0/1024 MB, 47 entries)
    • cache STORE 12.5 KB (occupancy 324.0/1024 MB, 48 entries)
    • delivery FAILED (Connection reset by peer); cache entry protected for 120 min retry window

Mechanism and policy

Per-request flow

  ┌────────────────────────────────────────────────────────────────────┐
  │ POST /search                                                       │
  └─────────────────────────────┬──────────────────────────────────────┘
                                │
                       key = hash(FASTA) + mode + fractions + graph_id
                                │
                      ┌─────────▼─────────┐
                      │ cache.acquire(key)│  ── waiters++ ──┐
                      └─────────┬─────────┘                 │
                                │                           │
              ┌─────────────────┴────────────────┐          │
              ▼                                  ▼          │
        ┌──────────┐                       ┌─────────────┐  │
        │  MISS    │                       │    HIT      │  │
        │ (first)  │                       │ (in-flight  │  │
        └────┬─────┘                       │  or ready)  │  │
             │                             └──────┬──────┘  │
             │ compute query_fasta()              │         │
             │ (single- or multi-graph)           │         │
             ▼                                    │         │
       handle.set_result(json) ──┐                │         │
       or set_exception(eptr)    │                │         │
                                 ▼                ▼         │
                          ┌──────────────────────────┐      │
                          │  shared_future<ResultPtr>│      │
                          │   .get()  →  Json::Value │      │
                          └──────────────┬───────────┘      │
                                         │                  │
                                  write response            │
                                         │                  │
                            on_sent(ec) callback fires      │
                              │                  │          │
                              ▼                  ▼          │
                      mark_delivered      mark_protected    │
                              │                  │          │
                              └────────┬─────────┘          │
                                       ▼                    │
                                ~Handle()  ── waiters-- ────┘

Entry state machine

                  acquire (miss)
                        │
                        ▼
                   ┌──────────┐
                   │ PENDING  │      computation in flight
                   └─────┬────┘      waiters ≥ 1, never evicted
                         │ set_result(...) / set_exception(...)
                         ▼
                  (response ready)
                         │
                  on_sent(ec) fires
                         │
              ┌──────────┴──────────┐
       ec != 0│                     │ec == 0
              ▼                     ▼
       ┌─────────────┐        ┌──────────┐
       │  PROTECTED  │        │ DELIVERED│   sink state — a later
       │             │        │          │   mark_protected() is
       │ within 2h   │        └──────────┘   a no-op.
       │  of last    │                       Normal LRU.
       │   hit:      │
       │ priority    │       successful retry  ┌──────────┐
       │ over        │ ────────────────────►   │ DELIVERED│
       │ DELIVERED.  │                          └──────────┘
       │             │
       │ past 2h:    │
       │ graduates   │
       │ to main     │
       │ cache       │
       │ (LRU like   │
       │ any other)  │
       └─────────────┘

   each acquire() during PROTECTED refreshes ready_at = now (sliding window)

Eviction policy (two-pass size-pressure sweep)

                  Eviction priority (most → least evictable)
                  ─────────────────────────────────────────────
                    DELIVERED              (LRU oldest first)
                    PROTECTED past 2h      (LRU oldest first)
                  ─── pass 1 boundary ────────────────────────
                    PROTECTED within 2h    (LRU oldest first)   ← held with
                                                                  priority
                                                                  unless cache
                                                                  has only
                                                                  these left
                  ─── pass 2 boundary ────────────────────────
                    waiters > 0            (NEVER evicted)


  ┌────────────────────────────────────────────────────────────────┐
  │ Pass 1: while over budget,                                     │
  │   walk LRU back→front, evict the oldest waiterless entry that  │
  │   is NOT PROTECTED-within-window.                              │
  │                                                                │
  │ Pass 2: still over budget? Walk again, evict any waiterless    │
  │   entry. Bounds memory under heavy PROTECTED pressure.         │
  │                                                                │
  │ Stops when under cap, or no candidate remains.                 │
  │                                                                │
  │ Triggered on insert (set_result) and on waiter release.        │
  │                                                                │
  │ refcount invariant:                                            │
  │   ── producer & readers ⇒ waiters ≥ 1                          │
  │   ── only ~Handle() drops the count                            │
  │   ── in-flight / being-read entries are NEVER evicted          │
  └────────────────────────────────────────────────────────────────┘

Three retention classes

State Window Eviction priority
PENDING none never (waiters always ≥ 1)
DELIVERED none LRU, normal priority
PROTECTED within 2 h sliding held over DELIVERED (last resort)
PROTECTED past 2 h window past LRU, normal priority

Concurrent dedup (the shared_future hinge)

   Thread A                    Thread B (50 ms later)
   ─────────                   ──────────────────────
   acquire(k)  → MISS          ⋮
   compute_search(...)         acquire(k)  → HIT (same future)
        ⋮                      future.get()  ── BLOCKS ──┐
        ⋮ (12 sec)                                       │
   set_result(json)  ──────────────────────────► both unblock
                                                         │
   future.get() returns        future.get() returns ◄────┘
   (instant)                   (instant)

The cache stores a shared_future<shared_ptr<const CachedResult>>. The first arrival owns the promise; every later arrival just .get()s the future — no extra computation. Result is published once, observed N times.

What's in the cache key (and what isn't)

  IN  the key  ──►  FASTA-body hash, query_mode (resolved from JSON
                    flags: query_coords, query_counts, abundance_sum,
                    with_signature), discovery_fraction,
                    min_exact_match, max_num_nodes_per_seq_char,
                    top_labels, align, graph identity (path or
                    sorted multi-graph list).

  OUT of key  ──►  verbose_output      (server-startup constant)
                   Accept-Encoding     (compression is post-cache)
                   request_id          (different requests, same answer)

Cache stores semantic Json::Value; per-request flags that affect only formatting or transport are applied on the way out.

Out of scope (follow-ups)

  • Application-level handshake. Today the protection trigger is TCP-level (the on_sent error_code). A future POST /ack {request_id} can drive the same mark_delivered / mark_protected hooks — the API is symmetric.
  • Cache invalidation on graph reload. Graph identity is part of the key, so a swapped graph file just gets a different namespace; explicit invalidation will be added if/when graph hot-reload becomes a real workflow.

Test plan

  • Unit tests (tests/cli/test_server_cache.cpp, 14 cases): miss→hit, concurrent dedup, LRU under size pressure, refcount protection, PROTECTED retention, sliding-window refresh on hit, graduation to main cache past window, DELIVERED-as-sink immunity to later mark_protected, PROTECTED-within-window held over DELIVERED, PROTECTED entries evicted under heavy pressure, disabled cache (--cache-size 0), exception propagation, abandoned-producer.
  • All 3,653+ unit tests pass.
  • All 171 test_api integration tests pass (verified all query_mode paths still distinguish correctly with the resolved-from-JSON cache key).
  • All 3,594 integration tests pass.
  • Manual integration test: hit /search twice with identical payload → second response is sub-second; kill curl mid-response on a long query → re-issue same request → cache hit.

karasikov and others added 7 commits May 3, 2026 01:30
Dedupes concurrent identical requests via shared_future and lets an
upstream's retry-after-disconnect hit the cached result instead of
recomputing a multi-minute query.

- ServerQueryCache: bounded byte-budget LRU; entries with active waiters
  are never evicted; FAILED-delivery entries get a 2h TTL while
  DELIVERED entries live until size pressure forces eviction.
- /search handler acquires a Handle, computes on miss, reuses on hit;
  the on_sent callback drives mark_delivered / mark_failed.
- New --cache-size flag in GB (default 1; 0 disables).
- 10 unit tests covering dedup, eviction, TTL, refcount protection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Two-pass eviction in evict_under_pressure_locked: pass 1 sacrifices
  DELIVERED and past-TTL FAILED entries (LRU oldest first); pass 2 falls
  back to FAILED-within-TTL only when the cache would otherwise stay
  oversize. The failed-delivery TTL is now a retention *floor*, not a
  ceiling — a flood of small successful requests can't displace responses
  an upstream is still about to retry.
- mark_delivered() is a sink: a duplicate request whose later delivery
  fails cannot resurrect a previously-served entry's TTL. Implemented
  with a CAS loop in on_delivery.
- release_waiter runs both the TTL sweep and the size-pressure sweep,
  so the cache settles back under budget without waiting for the next
  insert (the just-inserted entry may have been waiter-protected during
  its own eviction pass).
- One cache log line per request when the cache is enabled:
    cache HIT  / cache STORE  with size + occupancy/entries
    delivery FAILED  with the configured retry-window minutes
- Three new unit tests:
    DeliveredIsSinkStateAgainstLaterFailure
    FailedWithinTtlPreservedOverDelivered
    FailedEntriesEvictedUnderHeavyPressure

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vocabulary aligned with the actual semantics: an entry whose delivery
failed becomes "protected" with priority retention for the configured
window, and graduates to the main cache after the window expires (state
stays PROTECTED but it loses priority and ages out via normal LRU). A
successful retry flips it to DELIVERED.

- DeliveryState::FAILED → DeliveryState::PROTECTED
- Handle::mark_failed   → Handle::mark_protected
- failed_ttl            → protection_ttl

Sliding retention window: every cache hit on a PROTECTED entry refreshes
ready_at, since the hit itself is direct evidence the upstream is still
retrying. Removed the proactive past-TTL eviction sweep — past-window
PROTECTED entries are kept until normal LRU pressure forces them out
(matches "graduate to main cache").

Two new tests:
  ProtectedPastWindowGraduatesToMainCache
  ProtectedHitRefreshesPriorityWindow

Class doc, Handle doc, and per-method comments in server_cache.{hpp,cpp}
rewritten for clarity:
  * Per-request lifecycle (acquire → compute/get → on_sent → ~Handle)
  * State machine (PENDING / DELIVERED-as-sink / PROTECTED with sliding
    window)
  * Two-pass eviction policy (priority class + bounded fallback)

server.cpp /search handler: top-of-handler comment block describing the
4-step request lifecycle and where the cache plugs in. Inline comments
mark each phase (cache handshake, hit fast path, miss compute & publish,
on_sent delivery reporter).

server_utils.hpp: process_request on_sent doc updated to describe the
PROTECTED-on-failure semantics.

Removed unused common/logger.hpp include from server_cache.cpp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cache used to hold a Json::Value tree, which (a) carried a lot of
allocator overhead so the byte budget was systematically off, and (b)
forced re-serialization on every hit. Switch the cache value to the
serialized JSON body (a std::string), which makes the byte accounting
exact (== response.size()) and lets hits return the body verbatim
without any Json::Value round-trip.

- Cache value: shared_ptr<const std::string> (dropped the CachedResult
  wrapper too; the size is just response.size()).
- process_request now takes a callback returning the already-serialized
  body. /search serializes once at cache-publish time; the other
  endpoints (/align, /column_labels, /stats) wrap their existing
  Json::Value with an inline Json::writeString call.
- Renamed Entry::approx_size_bytes → Entry::size_bytes (it's exact now).
- Extracted make_pending_entry() to dedup the entry-creation between
  the disabled-cache fast path and the miss path.
- Tightened comments throughout to match the simpler shape and the
  current vocabulary (PROTECTED state, sliding window, etc.). Removed
  stale references to CachedResult and the old proactive TTL eviction.
- Renamed the test helper make_result(N) → response_of_size(N).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant