Skip to content

[#976] Distribute devshard rewards at the end of epoch#1000

Open
dcastro wants to merge 3 commits intoupgrade-v0.2.12from
diogo/#976-subnet-rewards-end-of-epoch
Open

[#976] Distribute devshard rewards at the end of epoch#1000
dcastro wants to merge 3 commits intoupgrade-v0.2.12from
diogo/#976-subnet-rewards-end-of-epoch

Conversation

@dcastro
Copy link
Copy Markdown
Collaborator

@dcastro dcastro commented Apr 2, 2026

Distribute WorkCoins at the end of the epoch, instead of upon settlement.

@dcastro dcastro requested a review from Copilot April 2, 2026 12:15
@dcastro dcastro moved this from Todo to In Progress in [Serokell] Scalability Apr 2, 2026
@dcastro dcastro moved this from In Progress to In review in [Serokell] Scalability Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates subnet escrow settlement so host rewards are accrued to participant ledgers and only become liquid when claimed at epoch end, aligning subnet payouts with the chain’s epoch-based reward flow.

Changes:

  • Adds end-to-end test assertions that host bank balances do not change at settlement time, and validates rewards can be claimed after epoch end.
  • Refactors payout bookkeeping into a shared processParticipantPayment helper with overflow checks and subaccount logging.
  • Changes subnet escrow settlement to credit participant ledgers (instead of immediate bank sends) and updates unit tests accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
testermint/src/test/kotlin/SubnetTests.kt Extends e2e subnet settlement test to assert no immediate host payout and successful epoch-end reward claiming.
inference-chain/x/inference/keeper/msg_server_start_inference.go Introduces processParticipantPayment helper and uses it for executor payment ledger updates.
inference-chain/x/inference/keeper/msg_server_settle_subnet_escrow.go Switches validator payouts from bank transfers to participant-ledger credits during escrow settlement.
inference-chain/x/inference/keeper/msg_server_settle_subnet_escrow_test.go Updates tests to seed participants and assert ledger balances/stats instead of bank send mocks.
inference-chain/x/inference/keeper/msg_server_create_subnet_escrow_test.go Loosens bookkeeping log mock expectations to avoid test failures due to new logging.

@dcastro dcastro force-pushed the diogo/#976-subnet-rewards-end-of-epoch branch from 3ae0f1c to 233c554 Compare April 2, 2026 13:11
@dcastro dcastro linked an issue Apr 2, 2026 that may be closed by this pull request
@dcastro dcastro requested a review from Brgndy25 April 2, 2026 13:17
@dcastro dcastro assigned dcastro and unassigned Brgndy25 Apr 2, 2026
@dcastro dcastro added this to the v0.2.12 milestone Apr 2, 2026
@dcastro dcastro requested a review from gmorgachev April 2, 2026 13:20
Base automatically changed from diogo/#935-subnet-stats to upgrade-v0.2.12 April 2, 2026 20:31
@dcastro dcastro force-pushed the diogo/#976-subnet-rewards-end-of-epoch branch from 233c554 to 15e5d46 Compare April 2, 2026 20:33
@dcastro dcastro requested a review from akup April 2, 2026 20:34
participant, found := k.GetParticipant(ctx, addr)
// If the subnet is settled on a different epoch, it's possible a subnet's host is no longer a participant on this epoch.
// In this case, we skip payment to this host
if found {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to consider that epoch can change, and anyway participant should payed for inferences that he runned.
There is another thing, that we don't would like to pay fees to this participant for epochs where he wasn't active and there could be lot of issues that should be handled in dynamic shardchain development branch where we need to support adding/removing participants to subnet.

But anyway, It is better to handle the inference payout in this PR, and look for epochs where participant was active.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other funds (non-inference related) should be either explicitly burned, refunded to the escrow creator, or sent to a community pool

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to consider that epoch can change, and anyway participant should payed for inferences that he runned.

I guess if a subnet starts in epoch X and is settled on epoch Y, and a host H was active during X but not during Y, then we could fallback to making the payment to H immediately (i.e. using SendCoinsFromModuleToAccount) instead of being delayed (i.e. with processParticipantPayment). What do you think?

There is another thing, that we don't would like to pay fees to this participant for epochs where he wasn't active

Not sure what you meant here. Do you mean e.g:

  • imagine a subnet that lasts over 2 epochs, X and Y
  • some host H from that subnet ran some inference I1 during X and another inference I2 during Y
  • meanwhile, in the main chain, that host H was active during X , but not during Y
  • therefore H should be paid for inference I1 but not for I2

Is that what you meant?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seams we need something to start from.

At first phase we can't handle multi-epoch devshards. What was discussed with @gmorgachev we should force finalization of a devshard when epoch ends. This finalization should be scheduled to next tasks.

But anyway it is possible that finalization is performed at epoch X+1, so we need to check if participant was active at epoch X, not current epoch but epoch when subnet escrow started.

So it means that we shouldn't handle for now any complex logic, but just look for participant from lists of participants of epoch when escrow started.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just look for participant from lists of participants of epoch when escrow started.

Can you please elaborate on this?

We need to look up the participant for 2 reasons:

  1. to update their stats (inference count, validated, invalidated, missed) so that they're taken into account when calculating the punishments for the rewards at the end of the current epoch
  2. to accumulate EarnedCoins, so they can be claimed at the end of the current epoch

Regarding point 1: There's no point in updating the stats for older epochs, since those rewards have already been distributed. In my opinion, if the subnet was started on a previous epoch, we have 2 options:
* Option 1: don't update any stats
* Option 2: if the participant is still active in the current epoch, update the stats for the current epoch. Otherwise, don't update the stats.


Regarding point 2: Correct me if I'm wrong, but I don't think we can accumulate EarnedCoins for previous epochs. EarnedCoins are settled at the end of the epoch and then claimed; but I don't think we can increment them after that stage.

So, in my opinion, if the subnet was started on a previous epoch, I think we should:
* Option 1: send the coins directly to the host using SendCoinsFromModuleToAccount, and don't increment EarnedCoins.
* Option 1: if the participant is still active in the current epoch, increment their EarnedCoins in the current epoch. Otherwise, send the coins directly using SendCoinsFromModuleToAccount

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right on both points.

We anyway cannot punish for previous epoch and so we don't need these stats.
There is a big thing that should be redesigned in network... and devshards. The missed inferences if they over limit should trigger event that is sent to inference-chain before finalization and settlement. But it is definetely out the scope of the PR.

But we need stats also for reputation calculations, and there EpochPerformanceSummary is used. I think here we need to update previous EpochPerformanceSummary.

Option 1: if the participant is still active in the current epoch, increment their EarnedCoins in the current epoch. Otherwise, send the coins directly using SendCoinsFromModuleToAccount

It looks as the best option in current design.
But we shouldn't just send the coins directly to account. We first need to check if there was no missed confirmation PoCs, and also punishment for direct missed inference (inferences in old-fashion design).

We just need to check EpochPerformanceSummary to see if reward for epoch is > 0. If participant was punished his reward is 0, if not, it will be always > 0.

If reward wasn't claimed yet (also available at EpochPerformanceSummary) we can append revenue to SettleAmount and EpochPerformanceSummary, if it was already claimed we update only EpochPerformanceSummary and send coins directly to participant account

Copy link
Copy Markdown
Collaborator Author

@dcastro dcastro Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we need stats also for reputation calculations, and there EpochPerformanceSummary is used. I think here we need to update previous EpochPerformanceSummary.

Good catch, I wasn't aware of EpochPerformanceSummary, thank you!

We just need to check EpochPerformanceSummary to see if reward for epoch is > 0. If participant was punished his reward is 0, if not, it will be always > 0.

Good idea as well, agreed.


I implemented a simple approach, to minimize edge cases, in these 2 commits, to the branch for PR #1001:

  • [#976] Handle same-epoch and different-epoch settlements
  • [#976] Skip payout for previous epochs if the host was punished

To sum up:

  • if it's a same-epoch settlement, treat it was normal onchain inferences:
    • increment EarnedCoins
    • aggregate into CurrentEpochStats
  • if it's a cross-epoch settlement:
    • if the host was NOT punished in the previous epoch, send a direct bank transfer
    • aggregate into EpochPerformanceSummary

@akup
Copy link
Copy Markdown
Collaborator

akup commented Apr 5, 2026

This PR defers subnet host payouts from immediate BankKeeper transfers to internal participant ledger fields (CoinBalance / EarnedCoins) until epoch claim. The review flags two severe correctness issues: an early return in processParticipantPayment that can skip epoch stats initialization and risk nil-pointer panics on consensus paths, and settlement accounting where totalPayout (and skipped !found hosts) can mis-state refunds and trap value in the module account versus internal obligations. Secondary items are observability for money-moving paths and test / structure cleanups in Go and Kotlin.

Must fix

  • ensureParticipantEpochStats before zero payout — If processParticipantPayment returns early when payout == 0 without calling ensureParticipantEpochStats, CurrentEpochStats can stay nil and later logic may panic (msg_server_start_inference.go ~371–385). Initialize epoch stats before the zero-payout early return.
  • totalPayout only after successful participant updates — Accumulating totalPayout before confirming the participant exists / payment applies shrinks the creator refund for amounts that were never credited, leaving value stuck (msg_server_settle_subnet_escrow.go ~92–113). Move accumulation inside the successful found / apply path (or equivalent invariant).
  • Defined outcome for !found payouts — When a host is no longer a participant at settlement, skipped payouts must be explicitly routed (refund to escrow creator, community pool, documented burn, or tracked “orphan” ledger)—not silently dropped—so module bank balance and sum of internal ledger obligations cannot diverge permanently (same settlement file; aligns with skeptic review).

Should fix

  • Logging on financial paths — Log when payouts are skipped (!found) and when processParticipantPayment completes (balances / epoch stats updated), so operators and auditors can reconcile settlement (msg_server_settle_subnet_escrow.go, msg_server_start_inference.go).
  • Cosmos layout — Consider moving processParticipantPayment from msgServer to Keeper for reuse and idiomatic state helpers (msg_server_start_inference.go).

Nits

  • Settlement unit tests — Reduce duplication (setupActiveParticipant helper), prefer table-driven balance checks, tighten bank mock expectations from .AnyTimes() to .Times(n), avoid brittle fee math (msg_server_settle_subnet_escrow_test.go).
  • Kotlin e2e — Replace !! with checkNotNull (or similar) and use clearer map assertions (e.g. AssertJ containsExactlyEntriesOf) in SubnetTests.kt.
  • Document helper constraints — Note that processParticipantPayment is written for non-negative payouts only (not generic debits), if that remains true.

@@ -3,6 +3,7 @@ package keeper
import (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a thread to reply to this feedback: #1000 (comment)

  • ensureParticipantEpochStats before zero payout — If processParticipantPayment returns early when payout == 0 without calling ensureParticipantEpochStats, CurrentEpochStats can stay nil and later logic may panic (msg_server_start_inference.go ~371–385). Initialize epoch stats before the zero-payout early return.

This is fine, it's not ensureParticipantEpochStats's responsibility to make sure that CurrentEpochStats is initialized. No codepaths should assume that CurrentEpochStats is not nil anyway.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • totalPayout only after successful participant updates — Accumulating totalPayout before confirming the participant exists / payment applies shrinks the creator refund for amounts that were never credited, leaving value stuck (msg_server_settle_subnet_escrow.go ~92–113). Move accumulation inside the successful found / apply path (or equivalent invariant).
    ->
  • Defined outcome for !found payouts — When a host is no longer a participant at settlement, skipped payouts must be explicitly routed (refund to escrow creator, community pool, documented burn, or tracked “orphan” ledger)—not silently dropped—so module bank balance and sum of internal ledger obligations cannot diverge permanently (same settlement file; aligns with skeptic review).

These 2 points are being discussed in this other thread: #1000 (comment)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Cosmos layout — Consider moving processParticipantPayment from msgServer to Keeper for reuse and idiomatic state helpers (msg_server_start_inference.go).

Done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, it's not ensureParticipantEpochStats's responsibility to make sure that CurrentEpochStats is initialized. No codepaths should assume that CurrentEpochStats is not nil anyway.

Yes it is not used. Noisy comment, sorry

@dcastro dcastro force-pushed the diogo/#976-subnet-rewards-end-of-epoch branch from 14122ea to 26b7ffe Compare April 7, 2026 14:03
@tcharchian tcharchian moved this from Todo to In review in Upgrade v0.2.12 Apr 7, 2026
@tcharchian tcharchian changed the title [#976] Distribute subnet rewards at the end of epoch [#976] Distribute devshard rewards at the end of epoch Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In review
Status: In review

Development

Successfully merging this pull request may close these issues.

[P0] devshards: Distribute WorkCoins at the end of the epoch

5 participants