Skip to content

[#976] Aggregate devshards stats with CurrentEpochStats#1001

Open
dcastro wants to merge 7 commits intodiogo/#976-subnet-rewards-end-of-epochfrom
diogo/#976-subnet-aggregate-stats
Open

[#976] Aggregate devshards stats with CurrentEpochStats#1001
dcastro wants to merge 7 commits intodiogo/#976-subnet-rewards-end-of-epochfrom
diogo/#976-subnet-aggregate-stats

Conversation

@dcastro
Copy link
Copy Markdown
Collaborator

@dcastro dcastro commented Apr 2, 2026

In this PR: when a subnet is settled, we take the subnet's HostStats and merge them with the participant.CurrentEpochStats.

NOTE: participant.CurrentEpochStats keeps track of the total number of inferences performed (InferenceCount) and the number of successfully validated inferences (Validated). Subnets were not keeping track of these 2 stats, so we had to alter the subnet to start doing that and include those stats in the settlement.


Since the subnet's stats are now merged into participant.CurrentEpochStats, they will automatically be taken into account when:

  • calculating punishments WorkCoins/RewardCoins (see bitcoin_rewards.go)
  • participant's inactivity status (see status.go -> ComputeStatus)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends subnet host performance statistics with inference_count and validated, propagates those fields through protobuf/JSON boundaries (subnetctl + Kotlin client + chain types), and aggregates the resulting settlement stats into chain-side per-epoch storage and participant CurrentEpochStats.

Changes:

  • Add inference_count and validated to subnet host stats data models and protobuf schemas; update hashing/serialization to include them.
  • Track InferenceCount and Validated in the subnet state machine, and add/adjust tests around Validated counter transitions.
  • Aggregate settlement host stats into chain SubnetHostEpochStatsMap and into Participant.CurrentEpochStats, with keeper/msg-server test coverage.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
testermint/src/main/kotlin/data/subnet.kt Adds inference_count and validated fields to Kotlin settlement host stats DTO.
subnet/types/state.pb.go Regenerated protobuf bindings to include new host stats fields.
subnet/types/domain.go Extends HostStats domain model with InferenceCount and Validated (+ doc comments).
subnet/state/machine.go Updates state transitions to increment/decrement the new counters during inference/validation lifecycle.
subnet/state/machine_test.go Adds tests validating HostStats.Validated transition behavior for validations.
subnet/state/hash.go Includes new fields in host stats hash preimage (state root consistency).
subnet/proto/subnet/v1/state.proto Adds new host stats fields to subnet state protobuf schema.
subnet/cmd/subnetctl/marshal_test.go Updates settlement marshal/unmarshal tests to cover new fields.
subnet/cmd/subnetctl/main.go Extends JSON marshaling output with the new host stats fields.
inference-chain/x/inference/types/subnet_settlement.pb.go Regenerated chain settlement protobuf bindings to include new fields.
inference-chain/x/inference/types/subnet_escrow.pb.go Regenerated chain escrow/epoch stats protobuf bindings to include new fields.
inference-chain/x/inference/module/commands.go Extends CLI settlement JSON parsing/mapping to include new fields.
inference-chain/x/inference/keeper/subnet_settlement.go Includes new fields in settlement host stats hash computation.
inference-chain/x/inference/keeper/subnet_settlement_test.go Updates settlement hashing test fixtures for new fields.
inference-chain/x/inference/keeper/subnet_host_stats.go Aggregates new fields into SubnetHostEpochStatsMap and introduces aggregation into Participant.CurrentEpochStats.
inference-chain/x/inference/keeper/msg_server_settle_subnet_escrow.go Applies new aggregation into participant CurrentEpochStats during settlement processing.
inference-chain/x/inference/keeper/msg_server_settle_subnet_escrow_test.go Adds tests ensuring settlement updates CurrentEpochStats and subnet epoch stats map.
inference-chain/proto/inference/inference/subnet_settlement.proto Adds new fields to settlement protobuf schema.
inference-chain/proto/inference/inference/subnet_escrow.proto Adds new fields to escrow/epoch stats protobuf schema.
inference-chain/api/inference/inference/subnet_settlement.pulsar.go Regenerated Pulsar fast-reflection code for new settlement fields.
inference-chain/api/inference/inference/subnet_escrow.pulsar.go Regenerated Pulsar fast-reflection code for new escrow/epoch stats fields.

@dcastro dcastro force-pushed the diogo/#976-subnet-aggregate-stats branch from ea7b921 to ed38664 Compare April 2, 2026 13:10
@dcastro dcastro force-pushed the diogo/#976-subnet-rewards-end-of-epoch branch from 3ae0f1c to 233c554 Compare April 2, 2026 13:11
@dcastro dcastro linked an issue Apr 2, 2026 that may be closed by this pull request
@dcastro dcastro requested a review from heitor-lassarote April 2, 2026 13:18
@dcastro dcastro added this to the v0.2.12 milestone Apr 2, 2026
@dcastro dcastro requested a review from gmorgachev April 2, 2026 13:20
Copy link
Copy Markdown
Collaborator

@heitor-lassarote heitor-lassarote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only thing that I think would be nice in addition would be a Testermint test, but since validations are currently not working in E2E as expected, I think this may be skipped. In addition, testing some things such as missed requests will probably require some extra infra, which sounds out of scope for this. So I think, for now all is fine.

@dcastro dcastro force-pushed the diogo/#976-subnet-rewards-end-of-epoch branch from 233c554 to 15e5d46 Compare April 2, 2026 20:33
@dcastro dcastro force-pushed the diogo/#976-subnet-aggregate-stats branch from ed38664 to 7507b2c Compare April 2, 2026 20:33
@dcastro dcastro requested a review from akup April 2, 2026 20:34
@akup
Copy link
Copy Markdown
Collaborator

akup commented Apr 5, 2026

Here are new state fields, and on update it will lead to immediate consensus splits and data corruption on live networks.

@gmorgachev How adding fields is supposed to be handled in subnets? It is not binded to block height of the mainnet

@akup
Copy link
Copy Markdown
Collaborator

akup commented Apr 5, 2026

Threads subnet InferenceCount / Validated into L2 hashing and L1 settlement so subnet work feeds CurrentEpochStats. It has hard consensus and economics risk that is flagged: L2 hash changes without a coordinated rollout can split hosts, and incremental Validated accounting in machine.go is easy to get wrong (ordering, double application, underflow), which then flows into L1 rewards. L1 settlement may also mis-attribute stats across epochs, and storage migrations / tests need tightening before this is safe to ship.

Must fix

  • Version or rollout gate for L2 hash changes — Including InferenceCount and Validated in subnet/state/hash.go without a protocol/version boundary risks an immediate split between upgraded and non-upgraded hosts; coordinate upgrade or feature gating.

This one point needs to be handled in some generic upgrade tooling (proposed here: #1005 (comment))

  • Idempotent validation outcome handling — When vote thresholds fire, ensure Validated is not incremented again on late or duplicate votes after the record is already validated (subnet/state/machine.go).
  • Order-dependent running tallies — The logic updates sm.state.HostStats[rec.ExecutorSlot].Validated via scattered increments and decrements based on the real-time stream of MsgValidation messages.
    If a malicious sequencer orders an invalid vote before a valid vote, the rec.VotesInvalid == 0 condition fails, and the counter is not incremented. If the inference subsequently crosses the validation threshold, it appears another increment might occur in applyValidationVote (line 828). This spread of ++ and -- logic across different handlers based on transient state (rec.VotesValid == 0) guarantees that any future changes to voting rules will introduce accounting bugs.

Should fix

  • Test isolation — Stop mutating global sdk.GetConfig() in tests (parallel runs / flakiness); use TestMain or scoped setup (msg_server_settle_subnet_escrow_test.go).
  • Aggregation safety and failure behavior — Aggregation of InferenceCount and Validated lacks overflow protection. Fix: Use bits.Add32 and return an error if an overflow occurs.

Nits

  • Factor duplicated settlement test setup into a shared helper.
  • Tests re-deriving executor slot (inferenceID % len(hosts)) — couple to the real assignment from state/records instead of duplicating internals (machine_test.go).
  • Redundant participant loads vs data already in SubnetHostEpochStatsMap — simplify if one source of truth suffices.

Copy link
Copy Markdown
Collaborator

@akup akup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should handle votes duplication, order-dependent ++/--, overflows checks

@dcastro
Copy link
Copy Markdown
Collaborator Author

dcastro commented Apr 6, 2026

Should handle votes duplication, order-dependent ++/--, overflows checks

You mean in the updates of HostStats.Validated in machine.go?
Yeah, that is already being handled, there is a test in TestApplyDiff_Validation_ValidatedCounterTransitions that ensures duplicate calls and reordering of MsgValidation remains idempotent

@dcastro dcastro force-pushed the diogo/#976-subnet-aggregate-stats branch from 7507b2c to 458c54e Compare April 6, 2026 15:10
dcastro added 5 commits April 7, 2026 14:57
We want to integrate subnets' stats into the mainnet's host stats (see `CurrentEpochStats`).

However, mainnet currently tracks the total inference count and validated inferences for hosts, and these stats are used to calculate the host's inactivity status and punishments at the of the epoch.

This commit changes subnet's so that they now also track those 2 stats.
…in `MsgSettleSubnetEscrow`

When the chain receives a settlement json from a subnet, it should parse these 2 new fields, and then add them to `MsgSettleSubnetEscrow.HostStats`
…entEpochStats` and `SubnetHostEpochStats`

The inference chain keeps track of host stats in `CurrentEpochStats`, and of subnet-related stats in `SubnetHostEpochStats`.

`SubnetHostEpochStats` are only for audit/debugging purposes, but the `CurrentEpochStats` stats are later used to calculate the hosts' inactivity status (see `status.go`) and punishments at the end of the epoch (see `accountsettle.go` and `bitcoin_rewards.go`)

This commit adds the subnet's `InferenceCount` and `Validated` to both stats aggregations.
@dcastro dcastro force-pushed the diogo/#976-subnet-aggregate-stats branch from 458c54e to 5e77938 Compare April 7, 2026 14:03
@dcastro dcastro force-pushed the diogo/#976-subnet-rewards-end-of-epoch branch from 14122ea to 26b7ffe Compare April 7, 2026 14:03
@tcharchian tcharchian moved this from Todo to In review in Upgrade v0.2.12 Apr 7, 2026
@tcharchian tcharchian changed the title [#976] Aggregate subnet stats with CurrentEpochStats [#976] Aggregate devshards stats with CurrentEpochStats Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In review
Status: In review

Development

Successfully merging this pull request may close these issues.

[P0] devshards: Distribute WorkCoins at the end of the epoch

5 participants