Skip to content

Retriable reads: check version ref before retrying on pruned-key errors#3145

Open
jamesblackburn wants to merge 3 commits into
masterfrom
retriable-reads
Open

Retriable reads: check version ref before retrying on pruned-key errors#3145
jamesblackburn wants to merge 3 commits into
masterfrom
retriable-reads

Conversation

@jamesblackburn

@jamesblackburn jamesblackburn commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

What does this implement or fix?

When a reader resolves a symbol from its cached version chain just before a concurrent writer
supersedes and eagerly prunes the old version's keys, the read fails with KeyNotFoundException
or NoDataFoundException. Rather than surfacing this error, the retry loop now recovers
transparently:

  1. Reads the VERSION_REF key to compare the storage head against the cached head.
  2. If unchanged: the data is genuinely missing (not a race), so the exception is propagated
    immediately without consuming a retry slot.
  3. If changed: invalidates the stale cache entry. The retry repopulates it via the existing
    LOAD_LATEST shortcut, reading only the VERSION_REF — not the full version chain.

Net cost per retry: 2 VERSION_REF reads and 0 VERSION reads, regardless of how many
live versions exist for the symbol.

Tests added

  • test_read_retry.py — five Python unit tests covering the happy path, the no-retry path
    (genuinely missing version), per-symbol cache scoping (unrelated symbols unaffected), and O(1)
    read count with N=15 live versions asserted via query_stats.
  • InvalidateIfVersionRefChangedReturnsTrueWhenChangedFalseWhenUnchanged — C++ unit test
    verifying the method returns true and invalidates the cache on a ref change, and false when
    unchanged.
  • test_concurrent_read_write_eager_prune — stress test using two LMDB handles with concurrent
    readers and eager-pruning writers.

node_futures.reserve(keys.size());
for (const auto& key : keys) {
node_futures.emplace_back(read_frame_for_version(store(), key, read_query, read_options, handler_data));
} catch (const storage::NoDataFoundException&) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change - the rest is whitespace.

@jamesblackburn jamesblackburn added minor Feature change, should increase minor version patch Small change, should increase patch version and removed minor Feature change, should increase minor version labels Jun 2, 2026
@jamesblackburn jamesblackburn force-pushed the retriable-reads branch 2 times, most recently from 4218f6d to 21eb476 Compare June 2, 2026 20:56
When a read races with a concurrent eager prune (writer deletes a version's
keys immediately after superseding it), the reader catches
NoDataFoundException / KeyNotFoundException and retries. Previously the
retry called invalidate_cached_entry(), which forced a full version-chain
reload (O(N) storage reads for N live versions).

This change replaces that with reload_from_version_ref_if_changed():
- Reads the VERSION_REF key once (outside the lock) to get the current
  head from storage.
- If the head is unchanged the data is genuinely missing (not a race),
  so the exception is re-raised without consuming a retry slot.
- If the head changed, the cache entry is repopulated in-place from the
  ref data (mirroring the LOAD_LATEST shortcut in follow_version_chain),
  so the subsequent retry is a cache hit requiring 0 additional reads.

Net cost per retry: exactly 1 VERSION_REF read and 0 VERSION reads,
regardless of how many live versions exist in storage.

New tests:
- Python unit tests in test_read_retry.py covering the happy path,
  the no-retry error path, non-retriable versions, per-symbol scoping,
  and O(1) read count with N=15 live versions.
- C++ unit test ReloadFromVersionRefIfChangedUpdatesCacheAndReturnsFalseWhenUnchanged.
- Stress test test_concurrent_read_write_eager_prune using two LMDB
  handles with concurrent readers and eager-pruning writers.
@jamesblackburn jamesblackburn marked this pull request as ready for review June 3, 2026 05:45
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@-

@maxim-morozov maxim-morozov self-requested a review June 3, 2026 06:14

@maxim-morozov maxim-morozov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a real potential to DDos the storage. In case of storage slowdown or some network blip, we should check what exception we are getting, to make sure we dont retry in those cases. Otherwise, we will make things pretty bad in situations like this. The AWS sdk has retries as well, so it will quickly scale exponentially in terms of storage requests.

for (const auto& key : keys) {
node_futures.emplace_back(read_frame_for_version(store(), key, read_query, read_options, handler_data));
} catch (const storage::NoDataFoundException&) {
if (attempt >= max_attempts || !version_map()->invalidate_if_version_ref_changed(store(), stream_id))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a retry we'll end up reading the version ref twice:

  • Once in invalidate_if_version_ref_changed to check whether it changed
  • Once in read_frame_for_version to read the version chain

Probably not a big deal since retries will be fairly infrequent.

Ideally we could read it only once and short circuit if the same, but I think the current is good enough

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh invalidate_if_version_ref_changed isn't the right way to do this. If the ref key has changed, we should continue LOAD_LATEST_UNDELETED and make this the new entry

// raises E_NO_SUCH_VERSION (not caught here) and still fails fast. Preloaded-index reads carry
// their own index segment, so re-resolution cannot help them.
const bool is_preloaded = std::holds_alternative<std::shared_ptr<PreloadedIndexQuery>>(version_query.content_);
const int64_t max_attempts = is_preloaded ? 1 : ConfigsMap::instance()->get_int("VersionStore.ReadRetries", 3) + 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about retrying only on latest version queries?

It is unlikely for a failing a version chain v2->v1->v0 a read(as_of=0) to get deleted. People usually either always prune previous or never do it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, reading as_of specific versions, timestamps and snapshots will not benefit from retrying

// raises E_NO_SUCH_VERSION (not caught here) and still fails fast. Preloaded-index reads carry
// their own index segment, so re-resolution cannot help them.
const bool is_preloaded = std::holds_alternative<std::shared_ptr<PreloadedIndexQuery>>(version_query.content_);
const int64_t max_attempts = is_preloaded ? 1 : ConfigsMap::instance()->get_int("VersionStore.ReadRetries", 3) + 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the default retries should be fewer. Even just 1.
If someone races with their reads so frequently people are unlikely to get the result they want either way.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also agreed, if writes are happening so fast that a single retry doesn't help, then we create a thundering herd problem by retrying more times

Comment thread cpp/arcticdb/version/version_map.hpp Outdated

std::lock_guard lock(map_mutex_);
auto it = map_.find(stream_id);
const std::optional<AtomKey> cached_head = (it != map_.end()) ? it->second->head_ : std::nullopt;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic would be easier to follow if we short circuit the case where it == map_.end() with return true. There was no ref key to have changed.

Theoretically it is possible for the invalidate_if_version_ref_changed to return false if there was no cached entry for the symbol and the version ref contains no link to a version key (which should not be possible) but still makes reasoning about this harder imo

writer.terminate()
reader.terminate()

assert exceptions_in_reader.empty()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this test might be flaky. It doesn't seem impossible for the writer process to invalidate the reader multiple times.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this test will not be reliable, better to use the storage failure simulator for this sort of thing

node_futures.reserve(keys.size());
for (const auto& key : keys) {
node_futures.emplace_back(read_frame_for_version(store(), key, read_query, read_options, handler_data));
} catch (const storage::NoDataFoundException&) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the NoDataFoundException is needed because of exception reraises like this

I agree it's the correct thing for this PR but I think we should leave the more precise KeyNotFound exception in those reraises.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the current API can raise either depending on exactly when the failure occurs

# One retry: one ref read for the changed-ref check, one more for storage_reload's LOAD_LATEST
# shortcut. Both are VERSION_REF reads; no full VERSION-chain traversal.
assert _version_ref_reads(raced_stats) == 2
assert _version_reads(raced_stats) == 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to show that the index key is read just once


qs.enable()
qs.reset_stats()
result = reader.read(sym) # stale cache -> one retry

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also test a similar version chain with read(as_of=0) depending on what we decide for this comment this could mean no retries or a different exception being raised.

std::make_move_iterator(node_trys.end()),
std::back_inserter(node_results),
[](auto&& try_result) { return std::move(try_result).value(); }
ARCTICDB_DEBUG(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro gets compiled out of release builds. I would make the log info level - we want to know when this is happening

Comment thread cpp/arcticdb/version/version_map.hpp Outdated
return false;

if (it != map_.end())
map_.erase(it);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol should also be erased from the lock table if it is present

from arcticdb.util.test import config_context, query_stats_operation_count

# Large enough that the reader's cached version chain never expires during a test.
STICKY_RELOAD_INTERVAL = 2_000_000_000_000

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in nanoseconds, so isn't very long at all

with (
config_context("VersionMap.ReloadInterval", STICKY_RELOAD_INTERVAL),
config_context("VersionStore.ReadRetries", 0),
):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_context_multi

@@ -0,0 +1,152 @@
"""Deterministic tests for the read-retry behaviour in read_dataframe_version_internal.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright

@alexowens90

Copy link
Copy Markdown
Collaborator

This doesn't handle read_batch

- Retry only for latest-version reads (std::monostate): pinned queries
  (as_of=N, timestamp, snapshot, preloaded) use max_attempts=1 to
  avoid silently returning a different version.
- Reduce default VersionStore.ReadRetries from 3 to 1.
- Promote retry log line from ARCTICDB_DEBUG to info so it appears in
  release builds.
- Refactor invalidate_if_version_ref_changed: early-return true when
  no cached entry exists; after detecting a changed ref, proactively
  call follow_version_chain (LATEST/UNDELETED_ONLY) with the already-
  read ref_entry and stamp last_reload_time_, so the retry's
  check_reload is a pure cache hit with no second VERSION_REF read.
- Re-add test_concurrent_read_write_eager_prune stress test skipped
  in CI (RUNS_ON_GITHUB) to avoid non-deterministic failures there.
- test_read_retry.py: copyright header, config_context_multi, larger
  STICKY_RELOAD_INTERVAL (2**62 ns), _index_reads helper, updated
  _version_ref_reads assertions 2→1, new pinned-query no-retry tests.
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ArcticDB Code Review Summary

Re-reviewed the latest commits (delta 83a0bb4..646da0c). This delta is a clean refactor of the read-retry mechanism and broadens it considerably — no new must-fix correctness issues found:

  • The inline retry loop in read_dataframe_version_internal was extracted into three reusable helpers (retry_read_on_concurrent_prune single-symbol, the multi-symbol batch variant, and retry_failed_reads_on_concurrent_prune for post-collectAll batch results). The single-symbol/batch helpers correctly gate on std::monostate (latest reads only), and the post-collectAll variant correctly only invalidates+retries entries that failed with a missing-key error on a latest-version query.
  • The retry primitive is now applied across the rest of the read surface: read_column_stats, get_column_stats_info, get_index_range, read_descriptor(+batch), batch_read, batch_read_and_join, read_metadata(+batch), and read_modify_write. The read_modify_write wrapping correctly scopes the retry to the source read only (the target write stays outside and runs once), and batch_read_and_join correctly snapshots and restores the clause list on each attempt with a fresh ComponentManager. The synchronous-on-caller-thread constraint for retried batch reads is documented to avoid the threadpool-reentrancy deadlock.
  • version_core.cpp now throws KeyNotFoundException (instead of the generic E_KEY_NOT_FOUND raise) from the column-stats read paths so the race is catchable by the retry primitive — both still map to E_KEY_NOT_FOUND for callers.
  • The VersionStore.ReadRetries config knob was removed; retry is now a fixed single attempt. As this knob was introduced within this same (unreleased) PR, removing it is not a backwards-compatibility concern.
  • Tests are comprehensive: per-API recovery, bounded O(1) ref-key reads for batch, partial-race isolation, and the read_modify_write target-written-exactly-once guarantee.

One item still needs attention:

Documentation

  • The transparent read-retry-on-concurrent-prune behaviour is still undocumented. Per CLAUDE.md (new features must include documentation) and section 21 of the review guidelines, docs/claude/cpp/VERSIONING.md (the versioning/read-path area this change touches) should describe the behaviour. The earlier note about documenting the VersionStore.ReadRetries knob is now moot (the knob was removed), but the feature itself has grown in scope: latest-version reads across read, batch_read, batch_read_and_join, read_metadata(+batch), get_info(+batch), read_column_stats/get_column_stats_info, and read_modify_write can now transparently re-resolve to a newer version when racing an eager prune. This user-visible behaviour change warrants a doc note (and confirm the no-release-notes label is set correctly given the broadened user-facing behaviour).

…fy retry

Extend the single-retry-on-concurrent-prune behaviour to every version-resolving
read path, and address review of the initial batch implementation.

- Batch reads (read/metadata/descriptor) previously retried inside a .thenTry
  continuation that blocked on nested .get()s while running on an async::cpu_executor
  thread - the deadlock anti-pattern, since the nested read also needs that pool.
  Retries now run in a post-collectAll().get() loop on the caller thread, where
  blocking is safe. The iteration + gate + invalidate + log lives once in
  retry_failed_reads_on_concurrent_prune; each batch site passes only a retry_fn(idx).

- batch_read_and_join: replaced the bespoke for/should_retry_join/try-catch loop with
  a multi-symbol overload of retry_read_on_concurrent_prune, matching the single-symbol
  paths. Retries once if any latest-version symbol's version ref changed.

- read_column_stats_impl / get_column_stats_info_impl now throw KeyNotFoundException
  rather than the generic E_KEY_NOT_FOUND raise (which throws the base StorageException).
  The retry primitive catches KeyNotFoundException, so the column-stats wrapping was
  previously a no-op. Both still surface as E_KEY_NOT_FOUND to callers.

- Added deterministic race tests for read_modify_write (recovers source + writes target
  exactly once), read_column_stats and get_column_stats_info.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants