Skip to content

fix(keeper): propagate SetPocValidationV2 storage error in SubmitPocValidationsV2#1012

Open
Mayveskii wants to merge 1 commit intogonka-ai:mainfrom
Mayveskii:fix/poc-v2-store-failure-propagation
Open

fix(keeper): propagate SetPocValidationV2 storage error in SubmitPocValidationsV2#1012
Mayveskii wants to merge 1 commit intogonka-ai:mainfrom
Mayveskii:fix/poc-v2-store-failure-propagation

Conversation

@Mayveskii
Copy link
Copy Markdown

@Mayveskii Mayveskii commented Apr 4, 2026

Problem

SubmitPocValidationsV2 used LogWarn + continue for all SetPocValidationV2 errors, treating infrastructure-level storage failures identically to expected validation errors (e.g. invalid bech32 addresses).

A transient storage failure would:

  • silently drop votes from the batch
  • return success to the caller
  • leave consensus state inconsistent with no observable signal

Fix

Replace LogWarn + continue with LogError + return error for SetPocValidationV2 failures.

Why this is safe and does not break partial-success design:

HasPocValidationV2 (called earlier in the same loop iteration, line 116) already validates bech32 addresses for both participantAddress and validatorAddress. Any error reaching SetPocValidationV2 is therefore definitively a storage-layer failure, not an address validation error.

Two distinct paths:

  • Invalid address → caught by HasPocValidationV2LogWarn + continue. Partial-success preserved.
  • Storage failure → only reachable after address validation succeeds → LogError + return nil, fmt.Errorf(...). Cosmos SDK rolls back the full transaction.

TestSubmitPocValidationsV2_PartialSuccess passes without modification — "invalid_address" is caught at HasPocValidationV2, never reaches SetPocValidationV2.

Adds TestSubmitPocValidationsV2_StorageError_AddressPreValidation documenting this invariant.

Impact

  • Storage failures are now observable: LogError surfaces in node monitoring, transaction rollback prevents partial writes
  • Partial-success behavior for invalid/duplicate addresses unchanged (intentional design preserved)
  • Consistent with error propagation pattern in SubmitPocValidationV1 (single-item handler)

Test

go test ./x/inference/keeper/... -run "TestSubmitPocValidationsV2"
--- PASS: TestSubmitPocValidationsV2_DuplicateSkipped
--- PASS: TestSubmitPocValidationsV2_PartialSuccess
--- PASS: TestSubmitPocValidationsV2_StorageError_AddressPreValidation
ok  github.com/productscience/inference/x/inference/keeper

…ubmitPocValidationsV2

## Problem

`SubmitPocValidationsV2` used `LogWarn + continue` for all `SetPocValidationV2` errors,
treating infrastructure-level storage failures the same as expected validation errors
(e.g. invalid bech32 addresses). A transient storage failure would silently drop votes
and return success to the caller — leaving consensus state inconsistent with no observable signal.

## Fix

Replace `LogWarn + continue` with `LogError + return error` for `SetPocValidationV2` failures.

This is safe because `HasPocValidationV2` (called earlier in the same loop iteration)
already validates bech32 addresses for both participant and validator. Any error reaching
`SetPocValidationV2` is therefore definitively a storage-layer failure, not invalid input.

- Storage failure: `LogError` + abort batch via `return nil, fmt.Errorf(...)`.
  Cosmos SDK rolls back the full transaction, preventing partial writes.
- Invalid address: caught by `HasPocValidationV2` → `LogWarn + continue`.
  Partial-success design preserved. `TestSubmitPocValidationsV2_PartialSuccess` passes.

Adds `TestSubmitPocValidationsV2_StorageError_AddressPreValidation` documenting the
address pre-validation invariant that makes this distinction possible.

## Impact

- Storage failures are now observable (LogError, tx rollback) instead of silent
- Partial-success behavior for invalid addresses unchanged (design intent preserved)
- Closes gonka-ai#1012
@Mayveskii Mayveskii force-pushed the fix/poc-v2-store-failure-propagation branch from 1d883d6 to 5f6b2f2 Compare April 4, 2026 12:06
Copy link
Copy Markdown

@Doog-bot534 Doog-bot534 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(keeper): propagate SetPocValidationV2 storage error

Approve

The core logic change is sound. The original code silently swallowed storage errors and continued processing the batch, which could leave consensus state partially written. Since Cosmos SDK rolls back the entire transaction on error, aborting is the correct behavior for infrastructure failures.

The invariant argument is well-reasoned: HasPocValidationV2 validates bech32 addresses before SetPocValidationV2 is reached, so input validation errors are unreachable at the Set stage.

Suggestions

  1. Test gap: The test only tests the direct SetPocValidationV2 call with an invalid address. It does not test the new error propagation path in SubmitPocValidationsV2 itself. A test injecting a mock store failure would be more valuable to verify the batch actually aborts and returns an error.

  2. The return nil, fmt.Errorf(...) response returns a nil first value — verify callers/middleware handle nil response + non-nil error correctly. Standard Cosmos SDK convention, so likely fine.

Verdict: Correct directional fix. Ship it.

Payout address: gonka10zaal553duxp05nvfpqtsqrm2g0j6j34r8nan7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants