Retriable reads: check version ref before retrying on pruned-key errors by jamesblackburn · Pull Request #3145 · man-group/ArcticDB

jamesblackburn · 2026-06-02T20:38:45Z

What does this implement or fix?

When a reader resolves a symbol from its cached version chain just before a concurrent writer
supersedes and eagerly prunes the old version's keys, the read fails with KeyNotFoundException
or NoDataFoundException. Rather than surfacing this error, the retry loop now recovers
transparently:

Reads the VERSION_REF key to compare the storage head against the cached head.
If unchanged: the data is genuinely missing (not a race), so the exception is propagated
immediately without consuming a retry slot.
If changed: invalidates the stale cache entry. The retry repopulates it via the existing
LOAD_LATEST shortcut, reading only the VERSION_REF — not the full version chain.

Net cost per retry: 2 VERSION_REF reads and 0 VERSION reads, regardless of how many
live versions exist for the symbol.

Tests added

test_read_retry.py — five Python unit tests covering the happy path, the no-retry path
(genuinely missing version), per-symbol cache scoping (unrelated symbols unaffected), and O(1)
read count with N=15 live versions asserted via query_stats.
InvalidateIfVersionRefChangedReturnsTrueWhenChangedFalseWhenUnchanged — C++ unit test
verifying the method returns true and invalidates the cache on a ref change, and false when
unchanged.
test_concurrent_read_write_eager_prune — stress test using two LMDB handles with concurrent
readers and eager-pruning writers.

jamesblackburn · 2026-06-02T20:40:35Z

-        node_futures.reserve(keys.size());
-        for (const auto& key : keys) {
-            node_futures.emplace_back(read_frame_for_version(store(), key, read_query, read_options, handler_data));
+        } catch (const storage::NoDataFoundException&) {


This is the change - the rest is whitespace.

When a read races with a concurrent eager prune (writer deletes a version's keys immediately after superseding it), the reader catches NoDataFoundException / KeyNotFoundException and retries. Previously the retry called invalidate_cached_entry(), which forced a full version-chain reload (O(N) storage reads for N live versions). This change replaces that with reload_from_version_ref_if_changed(): - Reads the VERSION_REF key once (outside the lock) to get the current head from storage. - If the head is unchanged the data is genuinely missing (not a race), so the exception is re-raised without consuming a retry slot. - If the head changed, the cache entry is repopulated in-place from the ref data (mirroring the LOAD_LATEST shortcut in follow_version_chain), so the subsequent retry is a cache hit requiring 0 additional reads. Net cost per retry: exactly 1 VERSION_REF read and 0 VERSION reads, regardless of how many live versions exist in storage. New tests: - Python unit tests in test_read_retry.py covering the happy path, the no-retry error path, non-retriable versions, per-symbol scoping, and O(1) read count with N=15 live versions. - C++ unit test ReloadFromVersionRefIfChangedUpdatesCacheAndReturnsFalseWhenUnchanged. - Stress test test_concurrent_read_write_eager_prune using two LMDB handles with concurrent readers and eager-pruning writers.

claude · 2026-06-03T05:49:49Z

@-

maxim-morozov

This has a real potential to DDos the storage. In case of storage slowdown or some network blip, we should check what exception we are getting, to make sure we dont retry in those cases. Otherwise, we will make things pretty bad in situations like this. The AWS sdk has retries as well, so it will quickly scale exponentially in terms of storage requests.

IvoDD · 2026-06-03T07:37:28Z

-        for (const auto& key : keys) {
-            node_futures.emplace_back(read_frame_for_version(store(), key, read_query, read_options, handler_data));
+        } catch (const storage::NoDataFoundException&) {
+            if (attempt >= max_attempts || !version_map()->invalidate_if_version_ref_changed(store(), stream_id))


On a retry we'll end up reading the version ref twice:

Once in invalidate_if_version_ref_changed to check whether it changed

Once in read_frame_for_version to read the version chain

Probably not a big deal since retries will be fairly infrequent.

Ideally we could read it only once and short circuit if the same, but I think the current is good enough

Yeh invalidate_if_version_ref_changed isn't the right way to do this. If the ref key has changed, we should continue LOAD_LATEST_UNDELETED and make this the new entry

IvoDD · 2026-06-03T07:42:04Z

+    // raises E_NO_SUCH_VERSION (not caught here) and still fails fast. Preloaded-index reads carry
+    // their own index segment, so re-resolution cannot help them.
+    const bool is_preloaded = std::holds_alternative<std::shared_ptr<PreloadedIndexQuery>>(version_query.content_);
+    const int64_t max_attempts = is_preloaded ? 1 : ConfigsMap::instance()->get_int("VersionStore.ReadRetries", 3) + 1;


What do you think about retrying only on latest version queries?

It is unlikely for a failing a version chain v2->v1->v0 a read(as_of=0) to get deleted. People usually either always prune previous or never do it.

Agreed, reading as_of specific versions, timestamps and snapshots will not benefit from retrying

IvoDD · 2026-06-03T07:43:06Z

+    // raises E_NO_SUCH_VERSION (not caught here) and still fails fast. Preloaded-index reads carry
+    // their own index segment, so re-resolution cannot help them.
+    const bool is_preloaded = std::holds_alternative<std::shared_ptr<PreloadedIndexQuery>>(version_query.content_);
+    const int64_t max_attempts = is_preloaded ? 1 : ConfigsMap::instance()->get_int("VersionStore.ReadRetries", 3) + 1;


Also I think the default retries should be fewer. Even just 1.
If someone races with their reads so frequently people are unlikely to get the result they want either way.

Also agreed, if writes are happening so fast that a single retry doesn't help, then we create a thundering herd problem by retrying more times

IvoDD · 2026-06-03T07:55:59Z

+
+        std::lock_guard lock(map_mutex_);
+        auto it = map_.find(stream_id);
+        const std::optional<AtomKey> cached_head = (it != map_.end()) ? it->second->head_ : std::nullopt;


I think this logic would be easier to follow if we short circuit the case where it == map_.end() with return true. There was no ref key to have changed.

Theoretically it is possible for the invalidate_if_version_ref_changed to return false if there was no cached entry for the symbol and the version ref contains no link to a version key (which should not be possible) but still makes reasoning about this harder imo

IvoDD · 2026-06-03T08:01:29Z

+        writer.terminate()
+        reader.terminate()
+
+    assert exceptions_in_reader.empty()


It feels like this test might be flaky. It doesn't seem impossible for the writer process to invalidate the reader multiple times.

Agreed, this test will not be reliable, better to use the storage failure simulator for this sort of thing

IvoDD · 2026-06-03T08:20:51Z

-        node_futures.reserve(keys.size());
-        for (const auto& key : keys) {
-            node_futures.emplace_back(read_frame_for_version(store(), key, read_query, read_options, handler_data));
+        } catch (const storage::NoDataFoundException&) {


I guess the NoDataFoundException is needed because of exception reraises like this

I agree it's the correct thing for this PR but I think we should leave the more precise KeyNotFound exception in those reraises.

Unfortunately the current API can raise either depending on exactly when the failure occurs

IvoDD · 2026-06-03T08:24:05Z

+    # One retry: one ref read for the changed-ref check, one more for storage_reload's LOAD_LATEST
+    # shortcut. Both are VERSION_REF reads; no full VERSION-chain traversal.
+    assert _version_ref_reads(raced_stats) == 2
+    assert _version_reads(raced_stats) == 0


It would be useful to show that the index key is read just once

IvoDD · 2026-06-03T08:27:24Z

+
+        qs.enable()
+        qs.reset_stats()
+        result = reader.read(sym)  # stale cache -> one retry


It would be nice to also test a similar version chain with read(as_of=0) depending on what we decide for this comment this could mean no retries or a different exception being raised.

alexowens90 · 2026-06-03T09:46:39Z

-                std::make_move_iterator(node_trys.end()),
-                std::back_inserter(node_results),
-                [](auto&& try_result) { return std::move(try_result).value(); }
+        ARCTICDB_DEBUG(


This macro gets compiled out of release builds. I would make the log info level - we want to know when this is happening

alexowens90 · 2026-06-03T09:58:50Z

+            return false;
+
+        if (it != map_.end())
+            map_.erase(it);


The symbol should also be erased from the lock table if it is present

alexowens90 · 2026-06-03T10:04:08Z

+from arcticdb.util.test import config_context, query_stats_operation_count
+
+# Large enough that the reader's cached version chain never expires during a test.
+STICKY_RELOAD_INTERVAL = 2_000_000_000_000


This is in nanoseconds, so isn't very long at all

alexowens90 · 2026-06-03T10:04:34Z

+    with (
+        config_context("VersionMap.ReloadInterval", STICKY_RELOAD_INTERVAL),
+        config_context("VersionStore.ReadRetries", 0),
+    ):


config_context_multi

alexowens90 · 2026-06-03T10:05:38Z

@@ -0,0 +1,152 @@
+"""Deterministic tests for the read-retry behaviour in read_dataframe_version_internal.


Missing copyright

alexowens90 · 2026-06-03T10:06:07Z

This doesn't handle read_batch

- Retry only for latest-version reads (std::monostate): pinned queries (as_of=N, timestamp, snapshot, preloaded) use max_attempts=1 to avoid silently returning a different version. - Reduce default VersionStore.ReadRetries from 3 to 1. - Promote retry log line from ARCTICDB_DEBUG to info so it appears in release builds. - Refactor invalidate_if_version_ref_changed: early-return true when no cached entry exists; after detecting a changed ref, proactively call follow_version_chain (LATEST/UNDELETED_ONLY) with the already- read ref_entry and stamp last_reload_time_, so the retry's check_reload is a pure cache hit with no second VERSION_REF read. - Re-add test_concurrent_read_write_eager_prune stress test skipped in CI (RUNS_ON_GITHUB) to avoid non-deterministic failures there. - test_read_retry.py: copyright header, config_context_multi, larger STICKY_RELOAD_INTERVAL (2**62 ns), _index_reads helper, updated _version_ref_reads assertions 2→1, new pinned-query no-retry tests.

claude · 2026-06-09T16:39:38Z

ArcticDB Code Review Summary

Re-reviewed the latest commits (delta 83a0bb4..646da0c). This delta is a clean refactor of the read-retry mechanism and broadens it considerably — no new must-fix correctness issues found:

The inline retry loop in read_dataframe_version_internal was extracted into three reusable helpers (retry_read_on_concurrent_prune single-symbol, the multi-symbol batch variant, and retry_failed_reads_on_concurrent_prune for post-collectAll batch results). The single-symbol/batch helpers correctly gate on std::monostate (latest reads only), and the post-collectAll variant correctly only invalidates+retries entries that failed with a missing-key error on a latest-version query.
The retry primitive is now applied across the rest of the read surface: read_column_stats, get_column_stats_info, get_index_range, read_descriptor(+batch), batch_read, batch_read_and_join, read_metadata(+batch), and read_modify_write. The read_modify_write wrapping correctly scopes the retry to the source read only (the target write stays outside and runs once), and batch_read_and_join correctly snapshots and restores the clause list on each attempt with a fresh ComponentManager. The synchronous-on-caller-thread constraint for retried batch reads is documented to avoid the threadpool-reentrancy deadlock.
version_core.cpp now throws KeyNotFoundException (instead of the generic E_KEY_NOT_FOUND raise) from the column-stats read paths so the race is catchable by the retry primitive — both still map to E_KEY_NOT_FOUND for callers.
The VersionStore.ReadRetries config knob was removed; retry is now a fixed single attempt. As this knob was introduced within this same (unreleased) PR, removing it is not a backwards-compatibility concern.
Tests are comprehensive: per-API recovery, bounded O(1) ref-key reads for batch, partial-race isolation, and the read_modify_write target-written-exactly-once guarantee.

One item still needs attention:

Documentation

The transparent read-retry-on-concurrent-prune behaviour is still undocumented. Per CLAUDE.md (new features must include documentation) and section 21 of the review guidelines, docs/claude/cpp/VERSIONING.md (the versioning/read-path area this change touches) should describe the behaviour. The earlier note about documenting the VersionStore.ReadRetries knob is now moot (the knob was removed), but the feature itself has grown in scope: latest-version reads across read, batch_read, batch_read_and_join, read_metadata(+batch), get_info(+batch), read_column_stats/get_column_stats_info, and read_modify_write can now transparently re-resolve to a newer version when racing an eager prune. This user-visible behaviour change warrants a doc note (and confirm the no-release-notes label is set correctly given the broadened user-facing behaviour).

…fy retry Extend the single-retry-on-concurrent-prune behaviour to every version-resolving read path, and address review of the initial batch implementation. - Batch reads (read/metadata/descriptor) previously retried inside a .thenTry continuation that blocked on nested .get()s while running on an async::cpu_executor thread - the deadlock anti-pattern, since the nested read also needs that pool. Retries now run in a post-collectAll().get() loop on the caller thread, where blocking is safe. The iteration + gate + invalidate + log lives once in retry_failed_reads_on_concurrent_prune; each batch site passes only a retry_fn(idx). - batch_read_and_join: replaced the bespoke for/should_retry_join/try-catch loop with a multi-symbol overload of retry_read_on_concurrent_prune, matching the single-symbol paths. Retries once if any latest-version symbol's version ref changed. - read_column_stats_impl / get_column_stats_info_impl now throw KeyNotFoundException rather than the generic E_KEY_NOT_FOUND raise (which throws the base StorageException). The retry primitive catches KeyNotFoundException, so the column-stats wrapping was previously a no-op. Both still surface as E_KEY_NOT_FOUND to callers. - Added deterministic race tests for read_modify_write (recovers source + writes target exactly once), read_column_stats and get_column_stats_info.

jamesblackburn commented Jun 2, 2026

View reviewed changes

jamesblackburn added minor Feature change, should increase minor version patch Small change, should increase patch version and removed minor Feature change, should increase minor version labels Jun 2, 2026

jamesblackburn force-pushed the retriable-reads branch 2 times, most recently from 4218f6d to 21eb476 Compare June 2, 2026 20:56

jamesblackburn force-pushed the retriable-reads branch from 21eb476 to d6c151e Compare June 2, 2026 21:05

jamesblackburn marked this pull request as ready for review June 3, 2026 05:45

jamesblackburn requested review from IvoDD, alexowens90 and poodlewars as code owners June 3, 2026 05:45

maxim-morozov self-requested a review June 3, 2026 06:14

maxim-morozov reviewed Jun 3, 2026

View reviewed changes

IvoDD reviewed Jun 3, 2026

View reviewed changes

alexowens90 reviewed Jun 3, 2026

View reviewed changes

jamesblackburn force-pushed the retriable-reads branch from f168015 to 83a0bb4 Compare June 9, 2026 16:34

		@@ -0,0 +1,152 @@
		"""Deterministic tests for the read-retry behaviour in read_dataframe_version_internal.

Conversation

jamesblackburn commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement or fix?

Tests added

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxim-morozov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexowens90 commented Jun 3, 2026

Uh oh!

claude Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ArcticDB Code Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jamesblackburn commented Jun 2, 2026 •

edited

Loading

claude Bot commented Jun 3, 2026 •

edited

Loading

claude Bot commented Jun 9, 2026 •

edited

Loading