diff --git a/src/content/aeps/aep-1/README.md b/src/content/aeps/aep-1/README.md index cd3fe9a57..76f7eb886 100644 --- a/src/content/aeps/aep-1/README.md +++ b/src/content/aeps/aep-1/README.md @@ -3,7 +3,7 @@ aep: 1 title: AEP Purpose and Guidelines status: Final type: Meta -author: Greg Osuri (@gosuri), Adam Bozanich (@boz) +author: Greg Osuri (@gosuri) Adam Bozanich (@boz) discussions-to: https://github.com/ovrclk/aep/issues/1 created: 2020-03-09 updated: 2020-03-17 @@ -266,4 +266,4 @@ This document was derived heavily from [Ethereum's EIP-1] written by Martin Becz ## Copyright -All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). \ No newline at end of file +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-11/README.md b/src/content/aeps/aep-11/README.md index bc28ec1ae..5882f315f 100644 --- a/src/content/aeps/aep-11/README.md +++ b/src/content/aeps/aep-11/README.md @@ -1,12 +1,12 @@ --- aep: 11 title: Managed Services Market -author: Greg Osuri (@gosuri), Adam Bozanich (@boz) +author: Greg Osuri (@gosuri) Adam Bozanich (@boz) status: Draft type: Standard category: Core created: 2020-03-09 -estimated-completion: 2026-01-01 +estimated-completion: 2026-12-15 roadmap: major --- diff --git a/src/content/aeps/aep-12/README.md b/src/content/aeps/aep-12/README.md index e96fa8c7d..af1de3dc6 100644 --- a/src/content/aeps/aep-12/README.md +++ b/src/content/aeps/aep-12/README.md @@ -1,14 +1,15 @@ --- aep: 12 title: Trusted Execution Environment (TEE) -author: Adam Bozanich (@boz), Greg Osuri (@gosuri) +author: Adam Bozanich (@boz) Greg Osuri (@gosuri) discussions-to: https://github.com/orgs/akash-network/discussions/614 status: Draft type: Standard category: Core created: 2020-03-17 -updated: 2024-12-01 -estimated-completion: 2025-06-30 +updated: 2025-07-30 +estimated-completion: 2025-09-30 +superseded-by: 29 roadmap: major --- diff --git a/src/content/aeps/aep-13/README.md b/src/content/aeps/aep-13/README.md index 0124e7c2e..e904456cb 100644 --- a/src/content/aeps/aep-13/README.md +++ b/src/content/aeps/aep-13/README.md @@ -1,7 +1,7 @@ --- aep: 13 title: "Mainnet 2: DCX Platform" -author: Kaustubh Patral (@kaustubhkapatral), Greg Osuri (@gosuri) +author: Kaustubh Patral (@kaustubhkapatral) Greg Osuri (@gosuri) status: Final type: Standard category: Core @@ -81,4 +81,4 @@ In the event of upgrade complications, a well-defined rollback procedure should ## Copyright -All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). \ No newline at end of file +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-19/README.md b/src/content/aeps/aep-19/README.md index 7431cd581..d2952cf3f 100644 --- a/src/content/aeps/aep-19/README.md +++ b/src/content/aeps/aep-19/README.md @@ -1,7 +1,7 @@ --- aep: 19 title: "Open Development Model" -author: Greg Osuri (@gosuri), Anil Murty (@anilmurty) +author: Greg Osuri (@gosuri) Anil Murty (@anilmurty) status: Final type: Meta created: 2022-12-19 @@ -49,4 +49,4 @@ Regular meetings help the Akash Network community operate more efficiently. If n ## Announcements -* [Public Cloud is a Public Utility](https://akash.network/blog/public-cloud-is-a-public-utility/) \ No newline at end of file +* [Public Cloud is a Public Utility](https://akash.network/blog/public-cloud-is-a-public-utility/) diff --git a/src/content/aeps/aep-29/README.md b/src/content/aeps/aep-29/README.md index 054a4f805..503d34b7b 100644 --- a/src/content/aeps/aep-29/README.md +++ b/src/content/aeps/aep-29/README.md @@ -1,197 +1,111 @@ --- aep: 29 -title: "Verifiable Hardware Provisioning" -author: Sriram Vishwanath (@sriramvish) Artur Troian (@troian) +title: "Hardware Verification using Trusted Execution" +author: Anil Murty (@anilmurty) Artur Troian (@troian) status: Final type: Standard category: Core created: 2024-06-27 -updated: 2024-12-01 -estimated-completion: 2025-06-31 +updated: 2025-10-09 +estimated-completion: 2026-05-30 roadmap: major -discussions-to: https://github.com/orgs/akash-network/discussions/614 -resolution: https://www.mintscan.io/akash/proposals/261 --- - ## Motivation -Verification of resources is critical for on-chain incentivization; hence, we propose a TEE-based verification mechanism detailed below and extend the [Trusted providers](../aep-9/README.md) proposal. - -## Summary - -Verifiable computing is an entire class of algorithms or systems where a particular portion of the compute stack is verifiable/provable in a trustless manner to participants within a decentralized network. Verifiable computing can take many forms, including: - -Verifiable provisioning of hardware: This corresponds to the case where we desire to verify the nature and extent to which a piece of hardware is provisioned for the Akash network. - -Specifically, if a 4090 GPU were to be incorporated in the Akash network, verifiable provisioning ensures that it indeed matches its hardware specifications, and it is genuinely allocated for functions on the Akash network. - -Verifiable execution of program/software: This corresponds to the case where a program (any AI program, ranging from inference to training) is correctly executed on a node/set of nodes in the Akash network. For example, that a particular piece of code was executed correctly in a cluster of 4090s on the Akash network. Verifiable execution of programs/software also comes in multiple flavors, including: -- Non-real-time: An offline verification mechanism that presents a proof in non-real-time, where the proof has no time or size constraints. -- Optimistic, real-time proofs: An optimistic proof mechanism that can be verified or contested in (near) real time. -- Zero knowledge, real-time proofs: A zero knowledge proof mechanism (that does not reveal anything about the inputs but can still be verified) in (near) real time. - -In this proposal, for the first year of this project, we focus on only the first type of verifiability: That of provisioning of hardware. After the completion of this first portion of the project, a further proposal will be submitted on non-real-time and subsequently, real-time verifiable computing within the Akash network. Please review the discussions on Github here (https://github.com/orgs/akash-network/discussions/614). - -## Benefits to Akash Network +Currently, hardware provided by a provider is verified using a decentralized network of Auditors on Akash. While this approach is practical for a limited set of providers, the manual verification is proving challenging at scale, even more critical when incentives go onchain and are distributed without a human in the loop. Hardware Verification using Trusted Execution minimizes trust required to verify the accuracy of hardware provided by the providers on Akash network and serves as a fundamental building block for enabling Confidential Computing capabilities, as detailed in [AEP-65](https://akash.network/roadmap/aep-65/). -The need for verifiable provisioning of hardware is significant for a variety of reasons, including the elimination/reduction of Sybil attacks, and of other forms of misrepresentation and abuse in the network. +## Summary & Background -## Verifiable Hardware Provisioning +Hardware Verification is the process of verifying that the specific CPU or GPU is what the provider claims to be. In the context of [Confidential Computing](https://akash.network/roadmap/aep-65/), this is achieved through an attestation process using a Trusted Authority -Verifiable hardware provisioning can be achieved in a variety of ways: by using schemes uniquely associated with particular types of hardware, by using access patterns and footprints associated with a particular make and model, and other ways. However, these schemes are dependent on hardware configurations and do not necessarily generalize well. In order to develop a scalable, universal solution, we take a trusted enclave (trusted execution environment) approach as follows: +### Attestation Process -Akash providers that intend to be "hardware verifiable" are equipped with a TEE, configured by Akash (such as Trusty [1], for more information on TEE, see tutorial [2]). Such a TEE contains a physically unclonable function (a PUF, see [3]) that can securely sign transactions. To ensure uniformity, this TEE will be designed to be a USB A/C dongle that can be attached to any hardware configuration. +The attestation process with a trusted Authority is ratified in the IETF's [Remote Attestation Procedures Architecture (RATS) RFC 9334](https://datatracker.ietf.org/doc/rfc9334/) and can be outlined in the following block diagram. In this diagram, the "Attester" is the software running on the device (typically the CPU/ GPU), the "Relying Party" is the client (typically the application developer) and the "Reference Value Provider" is the vendor (Nvidia, Intel, AMD etc) -We will verify that the USB A/C dongle can be attached to any hardware configuration and provide a detailed set of instructions to install and use this dongle to enable each provider to become "hardware verifiable" on Akash. +![Attestation High-Level Flow](attestation-high-level.png) -This TEE will periodically perform the following two tasks, based on an internal pseudo-random timer: +At a high level, the attestation process involves three main steps: -### Identification task +#### 1. Measurement Collection -Following a pseudo-random clock, the TEE will query every GPU in the specific Akash provider on its status and device-level details. +The system gathers cryptographic measurements from the hardware platform — including CPU, GPU, firmware, bootloader, and drivers. These measurements serve as a unique fingerprint of the environment, rooted in hardware (e.g., via Intel TDX, AMD SEV-SNP, or NVIDIA NVTrust). These may include: + - Platform identity (vendor, model, firmware version) + - Enclave or VM launch measurements + - Device-specific attestation evidence (e.g., GPU certificate chain) -### Provisioning task -Periodically and randomly, a random machine learning task will be assigned to the GPUs within this provider. These provisioning tasks are based on existing, well-known benchmarks on the performance of GPUs to certain deep learning tasks, including particular types of models [4], more general deep learning models [5] and other tasks that are well-known benchmarks on existing GPUs [6]. +#### 2. Verification -After the conclusion of each type of pseudorandomly repeated task, the TEE will securely sign the message, and will share the secure message with the Akash network. +The collected evidence is sent to a remote verifier — either a vendor-provided service (e.g., [Intel Trust Authority](https://www.intel.com/content/www/us/en/security/trust-authority.html), [AMD Attestation Service](https://www.amd.com/content/dam/amd/en/documents/developer/lss-snp-attestation.pdf), NVIDIA [NVTrust CA](https://docs.nvidia.com/attestation/#overview)) or a custom verifier (sometime called a “local verifier”). -The tasks are used to ensure the following properties: +The verifier perfoms the following functions: + - Authenticates the hardware’s cryptographic identity + - Compares measurements against a set of trusted baseline values (aka “golden measurements”) + - Validates integrity and authenticity of the platform state -### Identification task +#### 3. Policy Enforcement +Based on the result of verification, an attestation policy is evaluated to determine if the workload should proceed. The policy might check for the following things: + - Is the platform from an approved vendor/model? + - Are all firmware and drivers up-to-date? + - Was the workload launched in a verified TEE? -The identification task sets up the base configuration for each GPU cluster, and assigns a unique signature associated with the TEE with that cluster. As the identification is performed at the operating system level, it can potentially be spoofed, and therefore, the provisioning/benchmarking tasks are required. +The outcome is a binary verdict (e.g., Attestation OK or Rejected) which can be used to: +- Gate access to secrets or encrypted data +- Approve running a sensitive workload +- Trigger alerts or block execution in untrusted environments -### Provisioning task +### Vendor SDKs -The provisioning/benchmarking tasks verify the identification while simultaneously ensuring that the associated GPUs are dedicated to the Akash network and are not prioritizing other tasks. In case they are not provisioned for the Akash network, they will fail the provisioning task. +#### NVTrust SDK -A key point is that both the entire system (user, operating system) cannot differentiate between a provisioning/benchmarking task and a regular AI workload provided by the Akash network, and therefore cannot selectively serve a particular type of workload/task. This ensures that the GPUs are both correctly identified and are made available to Akash network-centric tasks at all times. +Nvidia provides the [NVTRUST SDK](https://github.com/NVIDIA/nvtrust) that abstracts a lot of the complexity involved in attesting Nvidia GPUs (primarily H100s and NVSwitches) for trusted execution. This SDK provides abstractions for gathering evidence (aka measurements) as well as a verifier (NRAS) that plugs into Nvidia’s internal build pipeline (to obtain “golden measurements” through the RIM service). For reference see NRAS [documention](https://nras.attestation.nvidia.com/) and [API](https://docs.nvidia.com/attestation/api-docs-nras/latest/nras_api.html). -## Implementation +This is what attestation with the Nvidia SDK looks like at a high level -With provider service installaiton comes `Feature Discovery Service` (FDS) which hands inventory information to the provider engine. -The FDS functionality will be extended to snapshot information below which further will be signed by the provider and placed to the **DA**: -- CPU - - cpu id - - architecture - - model - - vendor - - micro-architecture [levels](https://github.com/HenrikBengtsson/x86-64-level) - - features -- GPU - - gpu id - - vendor (already implemented) - - model (already implemented) - - memory size (already implemented) - - interface (already implemented) -- Memory - - vendor - - negotiated speed - - timings - - serial number -- Storage +![NVTrust Attestation](nvtrust-attestation.png) -### Workflow +### Intel Trusted Authority SDK -For provider to be verified it must: -- commit first snapshot of resources upon commissioning to the network -- allow Auditor to inspect hardware -- commit snapshots: - - whenever there is change to the hardware due to expansion, maintenance - - when challenged by the Auditor (workflow TBD) +Since GPUs do not operte standalone - they typically are part of a server that includes a CPU (and memory, storage and other things) which is where the application is typically executed (with the AI model then getting loaded into GPU memory for inference or training or fine-tuning), the attestation must encompass the CPU, GPU and the interface between them. To make this easy for customers, Intel has an SDK of its own that plugs into the NVTrust SDK and enables performing attestation for the whole system with SDKs available in [python](https://github.com/intel/trustauthority-client-for-python) and [golang](https://github.com/intel/trustauthority-client-for-go). +![Intel Attestation](intel-ita-attestation.png) -### Stores extension -1. Implement extension to the `x/provider` store - ```protobuf - syntax = "proto3"; - package akash.provider.v1beta4; +## Scope of Work - import "gogoproto/gogo.proto"; - import "cosmos_proto/cosmos.proto"; +The Scope of work of this AEP is to test and document the hardware and BIOS configuration necessary to perform attestation so that this can be used to guide Akash Providers and to support the larger [Confidential Computing](https://akash.network/roadmap/aep-65/) goal. - message ResourcesSnapshot { - string owner = 1 [ - (cosmos_proto.scalar) = "cosmos.AddressString", - (gogoproto.jsontag) = "owner", - (gogoproto.moretags) = "yaml:\"owner\"" - ]; - google.protobuf.Duration timestamp = 2 [ - (gogoproto.jsontag) = "timestamp", - (gogoproto.moretags) = "yaml:\"timestamp\"" - ]; - // location of the snapshot on the external DA - string filepath = 3; - // checksum of the timestamp, filepath and it's content - string hash = 4; - } - ``` -2. Implement extension to the `x/audit` store - ```protobuf - syntax = "proto3"; - package akash.audit.v1; +To that end, the following will need to be done - import "akash/provider/v1beta4/provider.proto"; +1. Obtain or set up a provider with a GPU node or cluster that has the TEE capable hardware as noted in the following section +2. Apply BIOS configuration to allow access to the device nodes +3. Verify (manually) that attestation can be performed for the whole node - message AuditedResourcesSnapshot { - string auditor = 1 [ - (gogoproto.jsontag) = "auditor", - (gogoproto.moretags) = "yaml:\"auditor\"" - ]; +#### TEE Capable CPUs - akash.provider.v1beta4.ResourcesSnapshot snapshot = 2; - } - ``` +| Vendor | Feature | Required Models | +|--------|--------------------------------------|---------------------------------------------------------------------------------| +| Intel | TDX (Trust Domain Extensions) | Intel Xeon 5th Gen CPUs like “Sapphire Rapids” (with TDX BIOS support) | +| Intel | SGX (Software Guard Extensions) | Intel Xeon E3, Xeon D, and select 10th–11th Gen Core CPUs (now deprecated by Intel) | +| AMD | SEV | AMD EPYC “Rome” (7002 series) | +| AMD | SEV-ES / SNP | AMD EPYC “Milan” (7003) and “Genoa” (9004) series | -## Team +#### TEE Capable GPUs -The team for this project is led by Prof. Sriram Vishwanath from The University of Texas, Austin. Sriram Vishwanath is a professor at The University of Texas, Austin and Shruti Raghavan is a PhD candidate in Computer Science at UT Austin. They are working together with the Harvard Medical School and MITRE on the design of new foundation/base models in healthcare, with causal learning incorporated into such a platform. +| Vendor | Feature | Required Models | +|-------------|-------------|---------------------------------------------------------------------------------| +| NVIDIA | NVTrust | NVIDIA H100 or H200 (Hopper architecture) with CC-on mode | +| AMD/Intel | _None yet_ | No current support for GPU-based TEEs (CPU-side only) | -Sriram Vishwanath received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology (IIT), Madras, India in 1998, the M.S. degree in Electrical Engineering from California Institute of Technology (Caltech, Pasadena USA) in 1999, and the Ph.D. degree in Electrical Engineering from Stanford University, Stanford, CA USA in 2003. Currently, he is Professor in the Chandra Department of Electrical and Computer Engineering at The University of Texas at Austin, and recently, a Technical Fellow for Distributed Systems and Machine Learning at MITRE Labs. +In summary, Providers must use the following hardware: +- Intel CPUs with TDX (e.g., Xeon Sapphire Rapids) +- AMD CPUs with SEV-SNP (e.g., EPYC Milan/Genoa) +- NVIDIA H100 or H200 GPUs (for NVTrust support) -## Timeline +#### TEE Enabled Host Kernel & BIOS configuration -Open Discussions: Starting end of June 2024 -Governance Proposal: Through first half of July, 2024 -Design Phase: Through Q3 and Q4 2024 -Hacknet TEE Phase: Q1 2025 -Devnet TEE Phase: Q2 2025 -Conclusion of Hardware Provisioning testing and handover to Akash Team: End of Q2 2025 -Note: This is subject to change based on feedback - -## Deliverables - -Q3 2024 - High Level Design -Q4 2024 - Design Specification -Q1 2025 - Initial Hacknet Prototype -Q2 2025 - Devnet and Conclusion of Testing - -## Budget - -The tentative budget for this project is presented in the spreadsheet attached here (https://docs.google.com/spreadsheets/d/1asmvyi5r7QgKRjsImZInAENXptr_cwoW/edit?usp=sharing&ouid=103645797398143147236&rtpof=true&sd=true). - -The high-level breakdown for the budget is: -R\&D Costs (Student salaries + tuition + University Overhead): $146,547 -Akash Computing/Hardware Costs: $75,000 -Volatility and Liquidation Buffer (10%): $22,154.70 - -Total budget requested: $243,701.70 or 68,842.28 AKT -Wallet Address: akash1sa5quyrpmf3l2acfrwgsy9t34yxpkvwrnqdmm0 - -## Disbursement: - -Disbursement will happen in two increments, coinciding with the few weeks before the beginning of each semester - Fall 2024 (on July 22nd 2024) and Spring 2025 (December 15 2024). - -## References - -[1] Trusty TEE: Android Open Source Project https://source.android.com/docs/security/features/trusty -[2] TEE 101 White Paper https://www.securetechalliance.org/wp-content/uploads/TEE-101-White-Paper-FINAL2-April-2018.pdf -[3] Shamsoshoara, Alireza, et al. "A survey on physical unclonable function (PUF)-based security solutions for Internet of Things." Computer Networks 183 (2020): 107593. -[4] Wang, Yu Emma, Gu-Yeon Wei, and David Brooks. "Benchmarking TPU, GPU, and CPU platforms for deep learning." arXiv preprint arXiv:1907.10701 (2019). -[5] Shi, Shaohuai, et al. "Benchmarking state-of-the-art deep learning software tools." 2016 7th International Conference on Cloud Computing and Big Data (CCBD). IEEE, 2016. -[6] Araujo, Gabriell, et al. "NAS Parallel Benchmarks with CUDA and beyond." Software: Practice and Experience 53.1 (2023): 53-80. +BIOS configuration changes need to be made to enable TDX/ SGX (for intel) and SEV (for AMD). These typically also require a certain minimum version of the Linux Kernel to be used. -## Copyright +## References -All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). +1. [Intel](https://github.com/canonical/tdx/blob/1.2/README.md): Enable memory encryption, TDX and SGX for Intel +2. [AMD](https://github.com/AMDESE/AMDSEV/blob/master/README.md): Enable AMD SEV \ No newline at end of file diff --git a/src/content/aeps/aep-29/attestation-high-level.png b/src/content/aeps/aep-29/attestation-high-level.png new file mode 100644 index 000000000..2822bfb2c Binary files /dev/null and b/src/content/aeps/aep-29/attestation-high-level.png differ diff --git a/src/content/aeps/aep-29/intel-ita-attestation.png b/src/content/aeps/aep-29/intel-ita-attestation.png new file mode 100644 index 000000000..5ef1fdf60 Binary files /dev/null and b/src/content/aeps/aep-29/intel-ita-attestation.png differ diff --git a/src/content/aeps/aep-29/nvtrust-attestation.png b/src/content/aeps/aep-29/nvtrust-attestation.png new file mode 100644 index 000000000..84aa07609 Binary files /dev/null and b/src/content/aeps/aep-29/nvtrust-attestation.png differ diff --git a/src/content/aeps/aep-30/README.md b/src/content/aeps/aep-30/README.md index a107d8c25..56c1e9516 100644 --- a/src/content/aeps/aep-30/README.md +++ b/src/content/aeps/aep-30/README.md @@ -1,13 +1,13 @@ --- aep: 30 -title: "Cosmos SDK v0.47 Migration" +title: "Cosmos SDK v0.53 Migration" author: Cheng Wang (@lechenghiskhan) Artur Troian (@atroian) Scott Carrutthers (@chainzero) status: Final type: Standard category: Core created: 2024-08-29 -updated: 2024-02-18 -estimated-completion: 2025-03-15 +updated: 2024-04-22 +completed: 2025-10-28 roadmap: major requires: 61 discussions-to: https://github.com/orgs/akash-network/discussions/673 @@ -279,4 +279,4 @@ All funds will be liquidated and managed in a manner that ensures minimal impact ## Copyright -All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). \ No newline at end of file +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-33/README.md b/src/content/aeps/aep-33/README.md index b973c459a..5e02c5dd5 100644 --- a/src/content/aeps/aep-33/README.md +++ b/src/content/aeps/aep-33/README.md @@ -3,13 +3,13 @@ aep: 33 title: "Escrow Balance Alerts in Akash Console" description: "Alerting system for low escrow balance" author: Anil Murty (@anilmurty) Maxime Beauchamp (@baktun14) -status: Draft +status: Final type: Standard category: Interface created: 2024-12-01 -updated: 2025-01-10 -estimated-completion: 2025-02-28 -roadmap: minor +updated: 2025-03-12 +completed: 2025-06-30 +roadmap: major --- ## Motivation @@ -20,3 +20,5 @@ One of the primary issues users face with Akash is the unexpected termination of Users of Akash Console will have the option of configuring a low escrow balance alert for any deployment within their account and optionally tieing the alert to a notification channel. The initial notification channel supported will be email with more notification channels added over time, based on customer/ user feedback. The alert configuration will allow the user to specify a name that will show up in tne notification email subject as well as notes in the body that will let them quickly identify which account and deployment the alert is associated with. In addition the user will have the option to specify a threshold (<, =, >) that will determine when the alert is triggered. Lastly the user will have a global view of all alerts configured in the account and will be able to perform certain actions from there like viewing the alert configuration, disabling or deleting it and viewing all the past events triggered from it. +Github Milestone: https://github.com/akash-network/console/milestone/15 + diff --git a/src/content/aeps/aep-34/README.md b/src/content/aeps/aep-34/README.md index 883785e71..e3202e244 100644 --- a/src/content/aeps/aep-34/README.md +++ b/src/content/aeps/aep-34/README.md @@ -7,7 +7,7 @@ type: Standard category: Interface created: 2024-12-01 updated: 2025-01-10 -estimated-completion: 2025-03-15 +estimated-completion: 2026-02-28 roadmap: minor --- diff --git a/src/content/aeps/aep-35/README.md b/src/content/aeps/aep-35/README.md index a5515cb46..bbdba4e8e 100644 --- a/src/content/aeps/aep-35/README.md +++ b/src/content/aeps/aep-35/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2024-12-06 -estimated-completion: 2025-03-30 +updated: 2024-03-19 +estimated-completion: 2026-05-15 roadmap: minor --- diff --git a/src/content/aeps/aep-36/README.md b/src/content/aeps/aep-36/README.md index 2673db3f0..1bcebd851 100644 --- a/src/content/aeps/aep-36/README.md +++ b/src/content/aeps/aep-36/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2024-12-01 -estimated-completion: 2025-06-30 +updated: 2025-07-07 +estimated-completion: 2026-11-15 roadmap: minor --- diff --git a/src/content/aeps/aep-37/README.md b/src/content/aeps/aep-37/README.md index 3e5a04280..122512c9c 100644 --- a/src/content/aeps/aep-37/README.md +++ b/src/content/aeps/aep-37/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Core created: 2024-12-01 -updated: 2024-12-09 -estimated-completion: 2025-04-30 +updated: 2025-07-07 +estimated-completion: 2026-02-28 roadmap: minor --- diff --git a/src/content/aeps/aep-38/README.md b/src/content/aeps/aep-38/README.md index f1fcbd804..41fe7053f 100644 --- a/src/content/aeps/aep-38/README.md +++ b/src/content/aeps/aep-38/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2024-12-07 -estimated-completion: 2025-07-30 +updated: 2025-07-07 +estimated-completion: 2026-12-31 roadmap: minor --- diff --git a/src/content/aeps/aep-39/README.md b/src/content/aeps/aep-39/README.md index 8b0f6c8b6..12285dd05 100644 --- a/src/content/aeps/aep-39/README.md +++ b/src/content/aeps/aep-39/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2025-01-11 -estimated-completion: 2025-06-30 +updated: 2025-07-07 +estimated-completion: 2026-03-15 roadmap: minor --- diff --git a/src/content/aeps/aep-4/README.md b/src/content/aeps/aep-4/README.md index deaa6af95..4fcebe007 100644 --- a/src/content/aeps/aep-4/README.md +++ b/src/content/aeps/aep-4/README.md @@ -1,7 +1,7 @@ --- aep: 4 title: "Testnet: Alpha" -author: Greg Osuri (@gosuri), Adam Bozanich (@boz) +author: Greg Osuri (@gosuri) Adam Bozanich (@boz) status: Final type: Meta requires: 2, 3 diff --git a/src/content/aeps/aep-40/README.md b/src/content/aeps/aep-40/README.md index 42dda3e42..c8292e36d 100644 --- a/src/content/aeps/aep-40/README.md +++ b/src/content/aeps/aep-40/README.md @@ -7,7 +7,7 @@ type: Standard category: Interface created: 2024-12-01 updated: 2025-01-10 -estimated-completion: 2025-05-15 +estimated-completion: 2026-06-15 roadmap: minor --- diff --git a/src/content/aeps/aep-41/README.md b/src/content/aeps/aep-41/README.md index e19804333..ee28af070 100644 --- a/src/content/aeps/aep-41/README.md +++ b/src/content/aeps/aep-41/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2024-12-01 -estimated-completion: 2025-04-30 +updated: 2024-07-30 +estimated-completion: 2026-07-31 roadmap: minor --- diff --git a/src/content/aeps/aep-42/README.md b/src/content/aeps/aep-42/README.md index 820dce853..64c951100 100644 --- a/src/content/aeps/aep-42/README.md +++ b/src/content/aeps/aep-42/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2024-12-01 -estimated-completion: 2025-12-30 +updated: 2025-07-07 +estimated-completion: 2026-12-31 roadmap: minor --- diff --git a/src/content/aeps/aep-43/README.md b/src/content/aeps/aep-43/README.md index 89bc697c2..8c2843367 100644 --- a/src/content/aeps/aep-43/README.md +++ b/src/content/aeps/aep-43/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2024-01-11 -estimated-completion: 2025-06-30 +updated: 2025-07-30 +estimated-completion: 2026-07-15 roadmap: minor --- diff --git a/src/content/aeps/aep-44/README.md b/src/content/aeps/aep-44/README.md index aea1fa43a..c5182d724 100644 --- a/src/content/aeps/aep-44/README.md +++ b/src/content/aeps/aep-44/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2024-12-01 updated: 2025-01-11 -estimated-completion: 2026-08-30 +estimated-completion: 2026-10-31 roadmap: minor --- diff --git a/src/content/aeps/aep-45/README.md b/src/content/aeps/aep-45/README.md index 200e50555..027ec83a5 100644 --- a/src/content/aeps/aep-45/README.md +++ b/src/content/aeps/aep-45/README.md @@ -7,7 +7,7 @@ type: Standard category: Interface created: 2024-12-01 updated: 2024-12-01 -estimated-completion: 2025-11-30 +estimated-completion: 2026-08-31 roadmap: minor --- diff --git a/src/content/aeps/aep-47/README.md b/src/content/aeps/aep-47/README.md index f30dfc6c7..18da5c278 100644 --- a/src/content/aeps/aep-47/README.md +++ b/src/content/aeps/aep-47/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2024-12-01 updated: 2024-12-10 -estimated-completion: 2025-12-31 +estimated-completion: 2026-12-31 roadmap: minor --- diff --git a/src/content/aeps/aep-48/README.md b/src/content/aeps/aep-48/README.md index 1ec721c1a..0eebb1abf 100644 --- a/src/content/aeps/aep-48/README.md +++ b/src/content/aeps/aep-48/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2024-12-01 updated: 2025-01-11 -estimated-completion: 2026-05-30 +estimated-completion: 2026-07-30 roadmap: major --- diff --git a/src/content/aeps/aep-49/README.md b/src/content/aeps/aep-49/README.md index b83a097c1..97faccc82 100644 --- a/src/content/aeps/aep-49/README.md +++ b/src/content/aeps/aep-49/README.md @@ -6,8 +6,318 @@ status: Draft type: Standard category: Core created: 2024-12-01 +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream updated: 2025-01-11 -estimated-completion: 2026-02-20 +estimated-completion: 2026-03-31 +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes +======= +updated: 2026-05-01 +estimated-completion: 2026-08-30 +>>>>>>> Stashed changes roadmap: major --- diff --git a/src/content/aeps/aep-5/README.md b/src/content/aeps/aep-5/README.md index bdd100b81..de07f5aaf 100644 --- a/src/content/aeps/aep-5/README.md +++ b/src/content/aeps/aep-5/README.md @@ -1,7 +1,7 @@ --- aep: 5 title: "AKT: Akash Network Token & Mining Economics" -author: Greg Osuri (@gosuri), Adam Bozanich (@boz) +author: Greg Osuri (@gosuri) Adam Bozanich (@boz) status: Final type: Standard category: Economics diff --git a/src/content/aeps/aep-50/README.md b/src/content/aeps/aep-50/README.md index 058f84d12..25c27a8c5 100644 --- a/src/content/aeps/aep-50/README.md +++ b/src/content/aeps/aep-50/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2024-12-01 updated: 2025-01-10 -estimated-completion: 2025-11-30 +estimated-completion: 2026-06-30 roadmap: minor --- diff --git a/src/content/aeps/aep-51/README.md b/src/content/aeps/aep-51/README.md index 216fe519c..f5fff9e4c 100644 --- a/src/content/aeps/aep-51/README.md +++ b/src/content/aeps/aep-51/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2024-12-01 updated: 2025-01-10 -estimated-completion: 2026-01-30 +estimated-completion: 2026-07-31 roadmap: minor --- diff --git a/src/content/aeps/aep-52/README.md b/src/content/aeps/aep-52/README.md index a5d1764ab..0541ce97e 100644 --- a/src/content/aeps/aep-52/README.md +++ b/src/content/aeps/aep-52/README.md @@ -7,7 +7,7 @@ type: Core category: Interface created: 2024-12-01 updated: 2024-12-01 -estimated-completion: 2025-06-30 +estimated-completion: 2026-08-30 roadmap: minor --- diff --git a/src/content/aeps/aep-53/README.md b/src/content/aeps/aep-53/README.md index 8c0f89e78..de7b5d974 100644 --- a/src/content/aeps/aep-53/README.md +++ b/src/content/aeps/aep-53/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-12-01 -updated: 2025-01-10 -estimated-completion: 2025-05-30 +updated: 2025-07-30 +estimated-completion: 2026-12-31 roadmap: major --- diff --git a/src/content/aeps/aep-54/README.md b/src/content/aeps/aep-54/README.md index 25341f3c1..837317a08 100644 --- a/src/content/aeps/aep-54/README.md +++ b/src/content/aeps/aep-54/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2024-01-05 updated: 2025-01-10 -estimated-completion: 2025-05-15 +estimated-completion: 2026-12-15 roadmap: minor --- diff --git a/src/content/aeps/aep-55/README.md b/src/content/aeps/aep-55/README.md index 954b90462..e0d7ad493 100644 --- a/src/content/aeps/aep-55/README.md +++ b/src/content/aeps/aep-55/README.md @@ -6,9 +6,8 @@ status: Draft type: Standard category: Economics created: 2024-12-07 -updated: 2024-12-07 -estimated-completion: 2025-06-30 -roadmap: minor +updated: 2025-07-07 +superseeded-by: 76 --- ## Motivation diff --git a/src/content/aeps/aep-56/README.md b/src/content/aeps/aep-56/README.md index 1f30ad879..e49a063ea 100644 --- a/src/content/aeps/aep-56/README.md +++ b/src/content/aeps/aep-56/README.md @@ -1,30 +1,89 @@ --- aep: 56 -title: "Unified Akash Integration API" -author: Anil Murty (@anilmurty) Artur Troian (@troian) Iaroslav Gryshaiev (@ygrishajev) Maxime Beauchamp (@baktun14) -status: Draft +title: "Chain SDK" +author: Anil Murty (@anilmurty) Artur Troian (@troian) Serhii Stotsky (@baktun14) Maxime Beauchamp (@baktun14) +status: Final type: Standard category: Interface created: 2025-01-10 -updated: 2025-01-10 -estimated-completion: 2025-03-15 +updated: 2025-07-30 +completed: 2025-10-30 roadmap: major --- - ## Motivation -Integrations are a key part of Akash's growth strategy. In order for integrations to happen quicker Akash needs first class API support, coupled with easy to follow documentation and support for multiple programming languages. +Integrations are a key part of Akash's ecosystem growth strategy. In order for integrations to happen quicker Akash needs a feature rich and easy to use library for both blockchain nodes and provider nodes. + +## Background + +Right now interacting with the blockchain and the provider is arduous for someone who is not deeply involved with the core team. The reasons for this are primarily because there are a mix of different implementations for various things as noted here: + +- Blockchain nodes are built using CosmosSDK. +- Queries are done via a pure grpc service on top of protobuf. +- Transactions are done via RPC servers but the wire encoding also uses protobuf +- Provider nodes currently have a mix of GRPC server + protobuf and REST handlers for some of the mutations which are going to be rewritten to GRPC methods as well (see akash-network/support#191) + +## Scope of Work + +Investigate and implement chain SDK which supports: +* blockchain nodes API +* provider nodes API +* cosmosSDK built-in function (e.g., getting last block) + +Additionally, this SDK should have: +* certificates manager and corresponding utils (https://github.com/akash-network/akashjs/blob/main/src/certificates/certificate-manager/CertificateManager.ts) +* certificate validation logic for provider nodes https://github.com/akash-network/console/blob/main/apps/provider-proxy/src/services/CertificateValidator.ts +* SDL related logic + - move from https://github.com/akash-network/akashjs/blob/main/src/sdl/SDL/SDL.ts + - move from https://github.com/akash-network/console/tree/main/apps/deploy-web/src/utils/sdl + - we implemented SDL import from yaml and generator from object to yaml. The generator will have to be re-designed because it currently received an object of the type of the SDL builder form, which is not technically 1:1 with the SDL spec. + +All changes needs to be done in https://github.com/akash-network/akash-api/tree/sdk-47 (sdk-47 branch). Library should have the best possible typescript support in order to make it super-easy to use with IDE. Also it should be possible to use in browser, so it needs to be bundle size wise. + +### Additional notes + +#### api/grpc +```ts +import { ChainSDK } from "@akashnetwork/chain-sdk/chain-sdk" + +// Using the sdk instance +const chainSdk = new ChainSDK({ + rest: "https://api.akashnet.net", + rpc: "https://rpc.akashnet.net" +}); + +// Querying data from api or grpc with typed parameters +// https://api.akashnet.net/akash/deployment/v1beta3/deployments/info?id.owner=akash1234&id.dseq=1234; +const response = await axios.get(chainSdk.rest.deployments.info({ owner: "akash1234", dseq: "1234" })); +const deployment = response.data.deployment; // Typed response + +// Potential grpc usage +const response = await chainSdk.grpc.deployments.info({ owner: "akash1234", dseq: "1234" }); +``` + +#### protobuf + +```ts +// Importing the protobuf types +import { MsgCreateBid } from "@akashnetwork/chain-sdk/akash/market/v1beta4"; +``` + +#### certificates + +```ts +// Certificate utils +import { certificateManager } from "@akashnetwork/chain-sdk/certificate"; -## Summary +const { cert: crtpem, publicKey: pubpem, privateKey: encryptedKey } = certificateManager.generatePEM(address); +``` -While Akash has a Javascript API (AKashJS), it really is more of an SDK. Further, based on going through integrations with over a dozen partners, it is clear that folks cannot use it without handholding from the core team. The issues that users of AkashJS run into include, challenges with using the documented examples as well as not having enough examples. Akash needs a better JS API that abstracts away a lot of the underlying complexity of the blockchain and Akash specific things and is accompanies by easy to follow documentation. The story is similar for the GoLang API. Further, most API driven products offer suport for a wide range of programming languages. +#### sdl -The goal is this AEP is to build a first class API that can be used by partners and customers in a self-serve manner. If done correctly, this would be comparable if not better that the API offered by API-first companies like stripe (https://docs.stripe.com/api). +```ts +import { SDL, v2Sdl, NetworkId } from "@akashnetwork/chain-sdk/sdl"; -Specificaly the scope of this AEP will include -- Designing and implementing new interfaces that abstract a lot of the blockchain and Akash specific things as far as possible -- Implementing better error handling and reporting (HTTP response codes) -- Implementing version management -- Evaulating options for documentation (JSDocs, type-doc, Swagger, Docusaurus, Slateor others) and choosing one. -- Deciding on how to publicly display the API reference (where to put it, link it from etc) +export function getSdl(yamlJson: string | v2Sdl, networkType: NetworkType, networkId: NetworkId) { + return isValidString(yamlJson) ? SDL.fromString(yamlJson, networkType, networkId) : new SDL(yamlJson, networkType, networkId); +} +``` diff --git a/src/content/aeps/aep-57/README.md b/src/content/aeps/aep-57/README.md index 113005b36..6e014f73f 100644 --- a/src/content/aeps/aep-57/README.md +++ b/src/content/aeps/aep-57/README.md @@ -1,7 +1,7 @@ --- aep: 57 title: "Automatic Escrow Top Up" -author: Iaroslav Gryshaiev (@ygryshajev) Maxime Beauchamp (@baktun14) Anil Murty (@anilmurty) +author: Iaroslav Gryshaiev (@ygrishajev) Maxime Beauchamp (@baktun14) Anil Murty (@anilmurty) status: Draft type: Standard category: Interface @@ -22,4 +22,4 @@ Implement a new UI setting in the existing settings page that allows users to en https://github.com/akash-network/console/issues/412 Implement a worker CLI handler that automatically adds funds (top-ups) to Akash Network deployments when they are low on balance. This ensures deployments continue to run without requiring users to manually monitor and replenish funds, improving the user experience. -https://github.com/akash-network/console/issues/395 +https://github.com/akash-network/console/issues/395 diff --git a/src/content/aeps/aep-58/README.md b/src/content/aeps/aep-58/README.md index 5be269554..b11b4321f 100644 --- a/src/content/aeps/aep-58/README.md +++ b/src/content/aeps/aep-58/README.md @@ -6,8 +6,8 @@ status: Draft type: Standard category: Interface created: 2024-01-05 -updated: 2024-01-05 -estimated-completion: 2025-05-15 +updated: 2025-07-07 +estimated-completion: 2026-07-31 roadmap: minor --- diff --git a/src/content/aeps/aep-59/README.md b/src/content/aeps/aep-59/README.md index a717b8983..2d1273e15 100644 --- a/src/content/aeps/aep-59/README.md +++ b/src/content/aeps/aep-59/README.md @@ -6,9 +6,9 @@ status: Draft type: Standard category: Interface created: 2024-01-05 -updated: 2025-01-10 -estimated-completion: 2025-03-15 -roadmap: minor +updated: 2025-07-30 +estimated-completion: 2026-05-31 +roadmap: major --- diff --git a/src/content/aeps/aep-60/README.md b/src/content/aeps/aep-60/README.md index 9fad17cc4..cc085db27 100644 --- a/src/content/aeps/aep-60/README.md +++ b/src/content/aeps/aep-60/README.md @@ -1,303 +1,588 @@ --- aep: 60 -title: "Akash at Home" -author: Greg Osuri (@gosuri) +title: "Akash HomeNode - MVP" +author: Anil Murty (@anilmurty) Damir Simpovic (@shimpa1) Jigar Patel (@jigar-arc10) Deval Patel (@devalpatel67) Greg Osuri (@gosuri) +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream +<<<<<<< Updated upstream status: Draft type: Meta created: 2024-12-01 -updated: 2024-12-06 -estimated-completion: 2026-03-30 +updated: 2025-07-24 +estimated-completion: 2026-03-31 +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes +======= +status: Final +type: Meta +created: 2024-12-01 +updated: 2026-05-01 +completed: 2026-04-30 +>>>>>>> Stashed changes roadmap: major --- ## Motivation -As AI becomes more pervasive in our daily lives, the need for secure, private home-based AI infrastructure is growing. Traditional cloud-based AI services often require sending sensitive data to remote servers, raising privacy concerns. Akash at Home addresses this by enabling users to leverage their home computing resources to host AI workloads securely within their own network. - -## Summary - -Akash at Home is an initiative to transform residential computing resources into powerful AI hosting environments. The project aims to: -- Utilize unused compute capacity in home environments -- Enable private, secure AI workload hosting -- Democratize access to AI infrastructure -- Create a decentralized network of home-based compute resources - -## Model A: Production Grade Edge Datacenter at Home - -A production-grade edge data center at home consists of high-performance computing hardware optimized for AI inference workloads. This setup enables running sophisticated AI models locally, such as DeepSeek R1 (671B parameters), achieving speeds of 3,872 tokens per second. Key components include: +Enabling the average "home" user to participate in Akash as a provider is critical to both - scaling the supply side of the network as well as positioning the network to lead in the shift away from large data center compute in response to the oncoming energy crisis -- Enterprise-grade GPU infrastructure -- High-bandwidth networking -- Redundant power systems -- Advanced cooling solutions -In this scenario, we propose a topology with feasibility in Austin, Texas, where you're effectively acquiring the data center at no cost using Akash over a 5-year window. - -### Hardware Requirements - -- **High-Density GPU Servers:** The facility will host 5 × 8-GPU NVIDIA HGX H200 servers (total 40 GPUs). Each server is configured similarly to an AWS p5.48xlarge instance, with 8 H200 GPUs connected via NVLink/NVSwitch for high-bandwidth peer-to-peer communication (up to ~900 GB/s interconnect)​[^SECUREMACHINERY.COM]. Each server includes dual high-end CPUs (e.g. 3rd Gen AMD EPYC), ~2 TB of RAM, and ~30 TB of NVMe SSD storage, matching the p5.48xlarge specs​[^AWS.AMAZON.COM]. -This ensures each server can deliver performance comparable to AWS’s top GPU instances. This ensures each server can deliver performance comparable to AWS’s top GPU instances. -- **NVLink Switch Fabric**: An NVSwitch (NVLink Switch) is integrated into each HGX H200 baseboard, allowing all 8 GPUs in a server to directly communicate at full bandwidth. This provides ~3.6 TB/s bisection bandwidth within each server​[^AWS.AMAZON.COM], critical for multi-GPU training efficiency. The NVLink/NVSwitch fabric is a core component to match AWS’s architecture. -- **Rack Infrastructure:** All equipment will be mounted in a standard 42U data center rack. The 5 GPU servers (each ~4U–6U form factor) occupy roughly 20–30U, leaving space for networking gear and cooling components. Power Distribution Units (PDUs) (likely two for redundancy) are installed in-rack to supply power to each server’s dual PSUs. The PDUs must handle high load (total ~28 kW, see power section) and provide appropriate outlets (e.g. IEC 309 or HPC connectors) on 208–240V circuits. Each server’s PSU will connect to separate A/B power feeds for redundancy. -- **Networking Hardware:** A high-bandwidth Top-of-Rack switch is required to interconnect servers and uplinks. A 10 GbE (or 25 GbE) managed switch with at least 8–16 ports will connect the GPU nodes and the uplink to the ISPs. This switch should support the full 10 Gbps Internet feed and internal traffic between servers (which may need higher throughput if servers communicate). Additionally, a capable router/firewall is needed to manage dual ISP connections and failover. For example, an enterprise router with dual 10G WAN ports can handle BGP or failover configurations for the two ISPs and Starlink backup. -- **Ancillary Components:** Miscellaneous rack components include cable management, rack-mounted KVM or remote management devices (though IPMI/BMC on servers allows remote control, minimizing on-site interaction), and environmental sensors (temperature, humidity, smoke) for monitoring. Cooling apparatus may also be integrated (e.g. a rack-mounted liquid cooling distribution unit or rear-door heat exchanger – discussed in Cooling section). All components are chosen to ensure high uptime and remote manageability, aligning with the goal of minimal on-site staff. - -### Power and Cooling Considerations - -#### Power Demand and Electrical Upgrades - -Hosting 40 high-end GPUs in a residential building requires substantial power capacity. Each H200 GPU has a TDP around 700 W​[^TRGDATACENTERS.COM]. -An 8-GPU HGX H200 server draws about 5.6 kW under load​[^TRGDATACENTERS.COM]. -So five servers demand roughly 28 kW of power for the IT load alone. -This is far beyond typical residential electrical capacity, so significant electrical upgrades are needed: - -- **Service Upgrade:** The building will require a new dedicated electrical service (likely 208/240V three-phase) to support ~30–40 kW continuous load. -This may involve working with the utility to install a higher-capacity transformer and service drop. -For safety and headroom, a 50–60 kW electrical capacity is advisable to account for cooling systems and margin. -- **Distribution Panel:** A new electrical sub-panel with appropriate breakers (e.g. multiple 30A or 60A circuits) will feed the data center rack PDUs. -At 28 kW IT load, multiple 208V/30A circuits (each ~5 kW usable at 80% load) or 208V/50A circuits will be needed across the PDUs. -The panel and wiring must be rated for continuous high current. -- **Power Redundancy:** Ideally dual feed lines (from separate breakers or even separate utility phases) can supply the A/B PDUs. -If the building only has one utility feed, the secondary feed could come from a UPS/generator (discussed below). -All equipment will be on UPS power to ride through short outages and ensure clean shutdown if needed. - -#### Solar Power: Primary Supply vs. Cost Mitigation - -The building offers 4,000 sq ft of rooftop area for solar panels. This area can host a sizable photovoltaic (PV) array, but using solar as the sole primary power source is challenging: - -- **Solar Capacity:** 4,000 sq ft of modern panels (≈20 W per sq ft) can generate on the order of 75–80 kW peak DC​[^US.SUNPOWER.COM]. -In peak sun, this could more than cover the ~30 kW IT load. However, that peak is only during mid-day; energy production drops in mornings, evenings, and is zero at night. -Over a full day, a 80 kW array in Austin might produce ~400–500 kWh, whereas the data center would consume ~800 kWh per day running 24/7. - -- **Battery Requirement for Primary Power**: To truly run off-grid on solar, a large battery bank is needed to store excess solar energy for nighttime. -For example, supplying ~28 kW overnight (12 hours) requires >300 kWh of storage. This is equivalent to dozens of Tesla Powerwall units or industrial batteries, adding hundreds of thousands of dollars in cost. -Even then, multiple consecutive cloudy days could threaten uptime without grid/generator support. So, a pure solar-plus-battery solution has very high CapEx and complexity. - -- **Solar as Cost Mitigation**: A more feasible approach is to use the solar installation to offset electricity costs and provide backup capability, rather than as the only source. -During sunny hours, the data center can draw on solar power (reducing grid consumption), and even feed surplus back to the grid if production exceeds load (via net metering or feed-in tariffs). -This lowers the electric bill significantly. At night or during high load beyond solar output, the facility would use grid power normally. -In this role, the solar array acts as a cost mitigation tool and a partial backup (able to supply some power during daytime outages). - -- **Cost Comparison**: A 50–80 kW solar PV system is a major upfront investment (rough estimate $150k–$200k installed for 4000 sq ft). -As a primary power source, you’d need to roughly double this investment to include massive batteries (e.g. adding perhaps $300k+ for storage and grid-islanding inverters), totaling near half a million dollars in CapEx. -In contrast, using solar without extensive storage keeps the cost to the PV array itself and maybe a modest battery/UPS, leveraging the grid for reliability. -The grid electricity cost in Austin ($0.08–$0.12 per kWh) is relatively low, so completely offsetting it with solar has a long payback. - -**Recommendation**: Use solar as a supplemental power source to shave peak usage and reduce energy costs, rather than sole supply. -This yields savings (potentially tens of thousands per year) while avoiding the impractical cost of running 24/7 on solar alone. - -### Cooling Solutions for High-Density GPUs - -Dissipating ~28 kW of heat in a small residential space is a critical challenge. Traditional comfort HVAC is insufficient, so purpose-built cooling is required. -Key considerations are efficiency, space footprint, and ease of maintenance: - -- **Air Cooling (CRAC/CRAH Units)**: One option is installing a dedicated computer room A/C (CRAC) or in-row cooling unit. -For ~30 kW heat, this might involve multiple 5-ton (60,000 BTU) air conditioning units or a single precision cooling unit. -While effective, standard air conditioners would occupy significant indoor space or require large condenser units outside. -They also need frequent maintenance (filters, coolant checks) and may struggle on extremely hot days. -In a rooftop deployment, direct-expansion HVAC units could be placed outside with ducting to the server room. - -- **Liquid Cooling (Direct-to-Chip or Rear-Door)**: Given the high heat density, liquid cooling is often more efficient. -One approach is direct-to-chip water blocks on the GPUs/CPUs, with a coolant distribution unit pumping liquid to a roof-mounted dry cooler or chiller. -Another compact approach is a rear-door heat exchanger: a radiator panel is installed as the rack’s rear door, absorbing the hot air from servers and cooling it via circulating water. -Rear-door units can handle 30+ kW per rack and require only a water loop to an external cooler. -This method has a minimal footprint (no extra floor space) and is relatively low maintenance (closed-loop water circuit with pumps). -It’s also “close-coupled” to the heat source, improving efficiency. - -- **Immersion Cooling**: Immersion in dielectric fluids is a cutting-edge solution where servers are submerged in cooling fluid. It can easily dissipate high heat loads and simplifies heat rejection (via external radiators). However, it requires specialized tank enclosures and complicates maintenance (servers must be lifted out for service). In a small operation with only 5 servers, immersion may add unnecessary complexity and make quick hardware swaps harder. - -**Recommended Cooling Solution**: A liquid cooling system is the most effective for this scenario. For example, equipping the HGX servers with water-cooled plates or using a rear-door liquid cooling unit would efficiently remove heat. -The heat could be expelled via a roof-mounted dry cooler or small chiller unit, taking advantage of outside air to dump heat. -This setup keeps the cooling infrastructure compact (mostly confined to the rack and a box on the roof) and maintenance is straightforward (primarily monitoring coolant and pump health). -It also keeps the room temperature moderate, which is important in a residential building to avoid hot spots. -Overall, liquid cooling provides high performance per footprint and aligns with sustainability (higher efficiency means lower cooling power draw, improving the PUE). -TRG Datacenters notes the availability of “waterless cooling” and advanced techniques for H100/H200 deployments​[^TRGDATACENTERS.COM] – a sign that traditional air cooling alone is not ideal for these 5.6 kW servers. - -### Backup Power and Power Conditioning - -For reliable operations, especially if leasing compute to customers, the data center must handle power outages or grid anomalies seamlessly: - -- **Battery UPS**: A battery Uninterruptible Power Supply is critical. This can be a large centralized UPS cabinet or distributed lithium-ion battery units. -The UPS provides instant failover power and voltage conditioning. It would carry the ~30 kW load for a short period (minutes to perhaps an hour) to ride out brief outages or to cover the gap until a generator starts. -Modern lithium-ion UPS systems or even a set of Tesla Powerwall batteries could serve this role. -For example, a 100 kW (several hundred kWh) battery system could keep the facility running for a few hours if needed, and also store solar energy for use in the evening peak. - -- **Diesel (or Natural Gas) Generator**: For longer power outages, a generator ensures continued operation. -A diesel generator around 50–60 kW capacity (to handle 30 kW IT load + cooling equipment and some buffer) would be installed (likely on the roof or a pad outside). -Upon a grid outage, it would auto-start (via ATS – automatic transfer switch) and take over from the UPS within 30–60 seconds. -This provides virtually indefinite backup as long as fuel is available. -Diesel is common for data centers, but a natural gas generator could be considered if a gas line is available, to avoid fuel storage. -In either case, the generator and fuel system add cost and maintenance (fuel checks, periodic test runs), but they are necessary for high uptime. - -- **Redundancy**: The combination of grid power + solar + UPS + generator creates multiple power layers. The primary source is grid (augmented by solar when available). -If grid fails, the UPS instantly holds up the load. For short outages (< 5 minutes), the battery might suffice alone. -If the outage persists, the generator starts and powers the facility until grid returns. Solar, if sun is up, can extend battery life or reduce generator fuel usage by sharing the load. -This multi-tier setup provides resilience. All critical power feeds (UPS output, generator output) tie into a transfer switch gear that feeds the main panel and PDUs, so the switchover is automatic and transparent to the servers. - -- **Power Conditioning**: The sensitive (and expensive) GPU servers benefit from clean, stable power. The UPS and power distribution system will regulate voltage and filter surges. -Additionally, surge protectors and proper grounding are implemented in the electrical design. These measures protect the hardware from power spikes or sags common in city grids. - -### Networking & Connectivity - -Reliable, high-bandwidth internet is essential for leasing servers on Akash. The plan includes a 10 Gbps bandwidth setup with multiple providers for redundancy: - -- **Primary ISP (Fiber)**: A business-grade fiber optic connection providing ~10 Gbps symmetrical bandwidth will be the main uplink. -Fiber offers low latency and high throughput, important for moving large datasets to/from the GPUs. In Austin, options include providers like AT&T Business Fiber or local ISPs. -A dedicated 10 Gbps enterprise line typically comes with an SLA (Service Level Agreement) for uptime, but at a high cost (often thousands of dollars per month, exact quotes vary​[^GIGAPACKETS.COM] -). If full 10 Gbps dedicated service is too costly initially, a slightly lower tier (e.g. 5 Gbps) business line could be used, but given the data-intensive nature of GPU workloads, planning for 10 Gbps is prudent. - -- **Secondary ISP (Alternative Path)**: For redundancy, a second independent internet service will be provisioned. This could be a different fiber provider or a cable/fixed wireless provider. -One cost-effective option in Austin is Google Fiber’s 8 Gbps residential service, priced around $150/month​[^FIBER.GOOGLE.COM]. -While technically a residential plan, 8 Gbps “Edge” service offers enormous bandwidth at low cost – however, it lacks the guaranteed uptime of a business line. -Another option is a 1–2 Gbps cable or fiber line from a different carrier (for example, if AT&T is primary, perhaps Spectrum or Zayo fiber as secondary). -The goal is path diversity, so ideally the secondary uses a different infrastructure. This dual ISP setup allows failover if the primary line experiences an outage or performance issues. - -- **Starlink Satellite Backup**: As a tertiary backup, Starlink Business satellite internet will be installed. Starlink can provide on the order of 100–200 Mbps downlink in the Austin area. -While far below 10 Gbps, it is entirely independent of local terrestrial infrastructure. -In an extreme scenario where both fiber links are down (e.g. widespread outage or fiber cut), Starlink ensures the data center is still reachable for basic management traffic and possibly to serve low-bandwidth client needs. -The latency (20–40 ms) and bandwidth of Starlink aren’t ideal for heavy data transfer, but it’s sufficient as an emergency link. -The cost for Starlink Business is a few hundred dollars per month, which is a reasonable insurance policy for continuity. - -- **Networking Equipment & Configuration**: A robust networking setup will tie these links together. A dual-WAN router or firewall (with support for load balancing/failover and BGP if using provider-independent addressing) will manage traffic. -In normal operation, the 10 Gbps primary carries the load; the secondary link can either remain idle hot-spare or be used in active-active mode (e.g. serve less critical traffic or load-balance outgoing requests). -The router will automatically fail over traffic to the secondary (or Starlink) if the primary drops. -Internally, the Top-of-Rack switch connects the servers at 10/25 Gbps and uplinks to the router at 10 Gbps. -This ensures each GPU server can fully utilize the internet pipe when needed. -All networking gear will have redundant power (connected to the UPS) to stay online during power events. - -- **Cost and Reliability Comparison**: The fiber business line offers the best reliability (uptimes >99.9% typically) but at high monthly cost. -The residential-style multi-gig fiber is much cheaper (e.g. $150/mo for 8 Gbps)​[^FIBER.GOOGLE.COM] but comes with no guaranteed uptime – repairs could take days if it fails. -By employing both, we get a balance: the cheap pipe can carry traffic most of the time, but if it fails, the expensive pipe ensures SLAs are met. -In effect, one could even invert the usage (use the cheap Google Fiber as primary under normal conditions to save money, and have the business fiber as the backup for SLA). -Regardless, with two wired providers and Starlink, the facility is well protected against outages, meeting enterprise connectivity standards. - -### Financial Projections and Profitability - -This section outlines the expected capital expenditures, operating costs, and revenue/profit over a 5-year period for the edge data center. All values are estimates based on current market data and historical trends. - -#### Capital Expenditures (CapEx) - -One-time setup costs (Year 0 investments) include: - -- **GPU Servers**: 5 high-end 8×H200 servers. NVIDIA H200 GPUs are estimated at ~$30k each, and complete 8-GPU systems range $275k–$500k depending on configuration​[^TRGDATACENTERS.COM]. -We assume ~$300k per server for a mid-range configuration. Total: ~$1.5 million for all servers. This covers GPUs, CPUs, memory, NVSwitch, etc. - -- **Rack & Power Infrastructure**: Rack enclosure, dual PDUs, cabling, and building electrical upgrade. The electrical work (new panel, transformer, wiring) might cost $20k–$50k, and rack hardware another ~$5k–$10k. Estimated: $30k. - -- **Cooling System**: If using liquid cooling, costs include coolant distribution units, piping, and a dry cooler/chiller. A high-capacity HVAC or liquid loop for ~30 kW could be $40k–$80k. We budget ~$50k for the cooling solution (e.g. rear-door heat exchanger and external radiator). - -- **Networking Gear**: Enterprise 10 GbE switch, dual-WAN router, and misc. networking equipment. Estimated: $15k for a quality switch and router with 10G capability. - -- **Solar Installation**: ~70–80 kW of solar panels (max that fits on 4,000 sq ft) at roughly $2/W. Estimated: $150k for the solar array (inverters, panels, mounting). This is an optional cost – included if we aim to deploy solar upfront. (If treating solar as a separate project, this could be deferred or scaled in phases to manage cash flow.) - -- **Battery UPS & Generator**: A battery UPS sized for the load (could be integrated in a large UPS unit or several battery packs) and a 50–60 kW diesel generator with ATS. Rough costs: $50k for a substantial UPS system, and $40k–$60k for the generator + install. Assume $100k total for backup power infrastructure. - -- **Total CapEx**: Approximately $1.8 – $1.9 million. For instance, adding the mid estimates above: $1.5M (servers) + $30k (power) + $50k (cooling) + $15k (network) + $150k (solar) + $100k (UPS+gen) ≈ $1.845M. Without solar (if one chose to initially rely on grid power), CapEx would be about $1.695M. These figures set the stage for amortization and ROI calculations. - -#### Operating Expenses (OpEx) +## Summary -Ongoing yearly costs include: +The goal of this project is to enable lighter weight edge compute devices including those running in average consumer homes to be able to participate in Akash’s provider infrastructure. The motivation for this stems from the following: -- **Power Consumption**: The GPUs and cooling will draw ~30–35 kW continuously. Yearly energy usage is 260,000 kWh (30 kW × 8760 hours, assuming some efficiency gains from liquid cooling). At an Austin commercial electricity rate ($0.10 per kWh), that’s about $26k/year in electricity costs. With the solar array offsetting perhaps ~50% of that (on sunny days the solar can supply a large portion of daytime power), the net grid electricity cost could drop to ~$12k–$15k/year. (This assumes the ~80 kW solar produces ~140 MWh/year that either directly powers load or is net-metered – saving ~$14k/year in power). We’ll use ~$15k/year net power cost assuming solar is active; ~$30k/year if not. +1. Many people want to participate in Akash by purchasing a GPU or two today but they aren’t technically proficient enough to run their own kubernetes cluster -- **Internet Connectivity**: Dual ISP fees and Starlink subscription. The primary business 10G line might be ~$1,000–$2,000/month, and backup lines (Google Fiber 8G and Starlink) another few hundred each. Budget roughly $2,500/month total for connectivity. Annual: ~$30k/year on internet service. (Using a residential primary could cut this significantly, but we’ll stay conservative to ensure quality of service.) +2. Even for people that are proficient, the infrastructure requirements to run a full multi-node cluster make it cost prohibitive to do so -- **Hardware Maintenance**: With minimal on-site staff, maintenance involves remote management and occasional contractor visits. We budget a small amount for maintenance contracts/spares – e.g. replacing failed components (disks, fans) and annual preventative maintenance on generator, cooling, etc. Estimated: $10k/year. (This could be higher if we include, say, a support contract for the servers or insurance. But since on-site staff is minimal, we assume only incidental costs.) +3. Nvidia (and presumably other vendors like Apple and AMD) have indicated plans to launch consumer desktop devices that are powerful enough to allow for “inference offload”. The imminent [Nvidia DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/) is the best example of that - it is being marketed as “A Supercomputer on your Desktop” -- **Miscellaneous**: Insurance for equipment, property tax on equipment (if applicable), and other overhead. This might add a few thousand. We’ll include $5k/year as a buffer. +This specific aep will be scoped to building an MVP of the service with a subsequent aep working in scaling it to a more production ready state -Combining these, the annual OpEx is roughly $60k/year (with solar) or ~$75k/year (without solar), dominated by power and internet bandwidth costs. +## High Level User Experience -#### Revenue Model and Utilization +The user experience of this product should be one where an average user who has a computation device (laptop, desktop or small server with a platform architecture we support) should be able to install a client (software) that then configures all the things necessary to allow that device to become part of the provider network. The user should then be able to visit a web application (and potentially a mobile application) where they can view all the devices they have on the network as well as rewards (earnings) they have accrued over time, add/ remove/ upgrade devices etc -The revenue comes from leasing the GPU servers on the Akash network (or similar). Assumptions: +High Level Architecture +The Akash Core team at Overclock Labs has performed several initial tests and is converging towards an architecture this can be achieved by essentially splitting the provider into a “centralized control nodes” and “decentralized worker nodes” type architecture. For the MVP of this, the control nodes for a provider will be managed by the Akash Core Team while individual users (who can be anyone in the world) can join their worker nodes to the kubernetes cluster. -- **Lease Rate**: The H200 GPUs are leased at $2.30 per hour per GPU (initial market rate, comparable to high-end H100 pricing). This equals $55.20 per GPU per day if fully utilized. +The networking challenge posed by worker nodes sitting behind a NAT gateway is solved by: -- **Utilization**: Assume an average 80% utilization in Year 1 – meaning each GPU is earning revenue ~19.2 hours out of 24. (Some downtime for when not leased or for maintenance). 80% is a reasonable starting point given ramp-up of clients and some idle periods. +1. Setting up one worker node (let’s call that the INGRESS WORKER NODE) to be part of the centralized controller cluster and this worker node serves as the ingress point for all the worker nodes that are part of the cluster. -- **Year 1 Revenue**: There are 40 GPUs total. At $2.3/hr with 80% usage, each GPU yields ~$16,120 per year. Annual Year-1 revenue ≈ $645,000 (40 × $16,120). Calculation: 40 × $2.3 × 0.8 × 24 × 365 ≈ $644,700. -This half-million-plus revenue in the first year assumes there is sufficient demand to keep the GPUs busy (which for cutting-edge H200s is likely, given heavy AI workload demand). +2. Use of Wire Guard allows for bidirectional communication with the worker nodes -However, market lease rates for GPUs tend to decline over time. New GPU generations and increased supply drive prices down. -For example, rentals of NVIDIA H100 GPUs dropped from around $8/hour at launch to under $2/hour within a year as supply caught up​[^LATENT.SPACE]. -We must factor in that our $2.30/hr rate may not hold steady for 5 years: +Tailscale offers a production ready easy-to-use implementation of wireguard and is likely the product that will be used for this. -We project the effective lease rate per H200 GPU will depreciate each year. Based on historical trends, a drop on the order of 10–15% per year is plausible if new competitors (like NVIDIA Blackwell series) emerge. -We will model a conservative scenario: Year 2 ~$2.00/hr, Year 3 ~$1.70/hr, Year 4 ~$1.50/hr, Year 5 ~$1.30/hr on average. -Utilization might increase as the service gains customers (perhaps up to 90%), but to keep estimates simple we’ll hold 80% utilization and focus on rate decline. -This decline in price is analogous to how the asset value depreciates – GPUs lose value as newer models appear. (In fact, high-end GPUs are among the fastest depreciating tech assets​[^BIGDATASUPPLY.COM].) -Historically, a GPU can lose ~50% of value in the first year, and ~75% by year three​[^BIGDATASUPPLY.COM]. -Rental pricing mirrors this as older GPUs command lower rates. Our revenue model accounts for this by lowering the hourly rate over time. 5-Year Revenue Projection (with 80% utilization): +This is what a single such cluster would look like conceptually: -| Year | Avg Lease Rate (USD/GPU/hr) | Annual Revenue (approx) | -| ---- | -------------------------- | ---------------------- | -| 1 | $2.30 | $645,000 | -| 2 | $2.00 (-13%) | ~$561,000 | -| 3 | $1.70 (-15%) | ~$477,000 | -| 4 | $1.50 (-12%) | ~$420,000 | -| 5 | $1.30 (-13%) | ~$364,000 | +![aah-regional-network](aah-regional-network.png) -*(The rate percentages indicate the drop from the previous year. Utilization kept at 80% for consistency.)* +And now imagine there being one such network per region. We would initially start with 2-3 per major continent and scale up from there. -Over five years, the cumulative revenue would be roughly ~$2.46 million. This assumes that demand remains high enough to keep 80% of capacity leased even as prices drop. In practice, we might increase utilization in later years (e.g. to 85–90%) as the service matures, which could somewhat offset the lower hourly rates. +![aah-global-network](aah-global-network.png) -#### 5-Year Profitability Outlook +## Proof-of-Concept (POC) Testing +To confirm that this is a viable solution and can be scaled to a reasonable size we need to perform the following minimum testing: -To evaluate profitability, we compare the revenue against expenses and include the residual value of hardware after 5 years: +- SINGLE REMOTE WORKER NODE TEST: Set up the control infrastructure with 3 control nodes + one ingress node + 1 worker node and test deploying a large number of pods on to the remote worker node. Confirm no networking issues, being able to SSH into the pods, being able to deploy a service that exposes an API endpoint and confirm that the API can be accessed from a public IP. -- **Yearly Operating Profit**: Subtracting the ~$60k OpEx per year from the revenues above, we get the annual net income from operations. -For example, Year 1 OpEx ~$60k, so profit ~$585k. By Year 5, revenue ~$364k minus OpEx ~$60k = $304k profit. +- MULTIPLE REMOTE WORKER NODE TEST: Set up the control infrastructure with 3 control nodes + one ingress worker node + large number (say 100) remote worker nodes and confirm no networking issues, test deployments, being able to deploy a service that exposes an API endpoint and confirm that the API can be accessed from a public IP, and stability of the cluster including etcd state over time. -- **CapEx Recovery**: The initial CapEx ($1.85M with solar) is a sunk cost to recover over time. We can amortize it linearly ($370k per year over 5 years) as a target. -In Year 1, the $585k operating profit easily covers the $370k amortization (leaving $215k surplus). -By Year 5, the $304k profit is slightly below a $370k straight-line amortization, reflecting how revenue declines over time. +- MULTIPLE INGRESS AND REMOTE WORKERS TEST: Set up the control infrastructure with 3 control nodes + 3 ingress worker nodes + large number (say 100) remote worker nodes and confirm no networking issues, test deployments and stability of the cluster including etcd state over time. Also confirm that load gets balanced across the three ingress worker nodes and that kubernetes continues to schedule pods evenly across all remote worker nodes. -- **Cumulative Cashflow**: Summing the annual profits: Year1 $585k, Year2 ~$501k, Year3 ~$417k, Year4 ~$360k, Year5 ~$304k yields about $2.17 million total pre-tax profit over 5 years. -This is in the same ballpark as the ~$1.85M initial investment, indicating a modest net gain by the end of year 5. +## Productization -- **Resale Value**: After 5 years, the H200 servers will be older tech, but not worthless. If history is a guide, high-end GPUs might retain perhaps 10–20% of their value after 5 years (many will have moved to next-next-generation by then). -For instance, 5-year-old NVIDIA V100 GPUs sell for only ~5–10% of original price on secondary markets. We’ll assume ~15% residual value for our equipment. -On a $1.5M server investment, that’s about $225k salvage value by selling the used servers or GPUs in year 5. This adds to the project’s return. +To productize this solution, there are a number of decisions that need to be made. Here is a non-exhaustive list of areas we need to flesh out further -- **ROI and Payback**: Considering the ~$2.17M cumulative profit plus ~$225k resale, the 5-year return is ~$2.395M on a $1.845M investment. -That’s an overall ROI of ~30% over 5 years, or roughly a 6% annualized return. Payback period (time to recoup the initial outlay from net cash flow) would be around 3.5–4 years in this scenario (cumulative profit crosses the initial cost in the latter half of year 4). -After 5 years, the operation has paid for itself and earned a modest profit on top. +- REGIONAL CLUSTERING: Since there is a limit to how many worker nodes we can reasonably have per cluster and more importantly all the network traffic to/ from the pods must go through the control infrastructure, we want to ensure that network congestion at the control infrastructure network and latency to the remote worker nodes is in check. To achieve this we will likely need to pursue a regional strategy, where there is one control node cluster per region and any new workers wanting to join the network are directed to the regional cluster that is in the same region as the worker nodes. There are several open questions that need to be addressed: + - How is the geolocation boundary defined (geo-IP? City? State?) and enforced? + - Does there need to be an API server that new nodes “check into” that then redirects them to join a specific regional control cluster? + - Does the core team maintain all the regional control clusters initially until we can verify viability? + - Is there a minimum number of remote worker nodes that is needed in a region before we commit to hosting and managing a regional cluster for that specific region? + - Will nodes be able to join permissionlessly? If yes, we need to model our security framework. If not, how can we provide the highest decentralization without sacrificing security? -- **Effect of Solar on Financials**: We included solar in CapEx and reduced annual power cost. If we exclude solar, initial CapEx drops by ~$150k (to ~$1.695M) but annual power costs rise by ~$14k (to ~$75k/year). -Over 5 years, not having solar would save $150k upfront but cost ~$70k more in OpEx, netting a $80k benefit by year 5. -This slightly improves short-term ROI (and earlier breakeven). However, solar’s 25+ year lifespan means after its ~10-year payoff period it would yield pure savings. -In a 5-year window, solar is close to break-even (especially with no subsidies assumed). -For a pure profitability standpoint, one might delay or scale the solar investment, but from a feasibility and sustainability view, using the rooftop for solar is still attractive for long-term gains and resiliency. +- HOMOGENOUS vs HETEROGENOUS CLUSTERS: Do all worker nodes of the same cluster have to be identical in terms of resources? Should we allow mixing? Should there be mixing of CPU and GPU nodes? -- **Price/Utilization Sensitivity**: The above outlook is sensitive to how well the GPUs are monetized. If demand is higher (90%+ utilization), revenues would increase ~12.5%, boosting profits. -Conversely, if competition forces prices down faster (say to $1/hr by year 5), total revenue would be lower. -The good news is that even under quite conservative pricing, the operation remains profitable within 5 years, largely because the upfront purchase avoids cloud markups. -(Note: cloud GPU instances can cost 2–3× the equivalent hardware cost over a few years​[^TRGDATACENTERS.COM], so owning hardware can pay off if utilization is high.) +- APPLICATION TYPES: Should some applications be prevented (or at least avoided) from being deployed on these clients? For example, applications that require an IP lease or large persistent storage may not be best suited for these devices. -### Feasibility Assessment +- USER CLIENTS: One of the goals of this project is to enable the average home user to be able to participate in Akash as a “tiny provider”. To this end the setup and configuration needs to be as seamless as possible. To achieve this, we need to build an easy to install UI based client that is supported for the main OS platforms with chips architectures that we support. For linux users this could be a command line interface (CLI) based installer. -In summary, setting up an edge GPU data center in a residential building is feasible but requires careful planning: +- END USER DEVICE MANAGER & DASHBOARD: There needs to be a way for end users to be able to view/ add/ remove devices, potentially view a map of all devices on the network (similar to Helium miners) and be able to view earnings and transfer out funds through a wallet. We will initially offer a web portal for this and potentially consider building a mobile app. -- Significant infrastructure upgrades (power and cooling) are needed to support the high-density servers in a non-datacenter environment. -- The project is capital-intensive (~$1.8M upfront), but operational costs are relatively low once running (mainly power and internet). -- **Profitability**: The 5-year projection shows a reasonable profit margin, though not astronomical. -There is room for higher returns if the facility achieves higher utilization or if hardware costs can be sourced lower. -Conversely, rapid GPU price drops and under-utilization are risks (the GPU rental market can fluctuate – e.g. an oversupply situation drove H100 rental rates down to ~$2/hr​[^LATENT.SPACE]). -We mitigated this by using a declining price model in our estimates. -- The edge location (in Austin) can actually be a selling point: users in the region get lower latency and data sovereignty compared to using distant cloud data centers. This could help maintain higher utilization. -- **Maintenance and Operations**: With remote management, the ongoing effort is low. The main operational tasks will be monitoring systems, applying software updates, and arranging repairs for any failed hardware. -The design with redundant power and connectivity ensures a high service uptime, which is crucial for attracting customers. +- ADMIN PORTAL: The administrators of the regional clusters need to be able to view and manage devices. This will likely be achieved with the Akash Provider Console but may require additional features. -Overall, the analysis indicates that an edge data center with 5×HGX H200 servers can be run in a retrofitted residential building space and achieve profitable operations within 5 years. While it won’t rival a hyperscale cloud in scale, it leverages owned infrastructure and renewable energy to deliver competitive GPU compute at a regional edge, aligning with the growing trend of decentralized, on-premise AI computing​[^TRGDATACENTERS.COM]. +- DEPLOY CONSOLE CHANGES: The deploy console client will need to be modified to make users aware of whether a specific compute provider is part of the edge network as the performance may be lower and there is a chance of them going offline that is higher than that of the regular (datacenter) providers. -## References +- REWARDS & EARNINGS DISTRIBUTION: Since all deployment earnings will end up in the wallet owned by the provider admin (initially OCL core team but potentially others over time) - we need to figure out what is a reasonable and fair reward distribution that: + - Covers the cost of hosting and managing the control infrastructure + - Incentivizes people to join the network + - Ensures that nodes that don’t get workloads (either because k8s scheduler didn’t pick their node or because of some other reasons) don’t get penalized. Aka fair allocation for participation. -[^TRGDATACENTERS.COM]: [NVIDIA H200 specifications and power requirements](https://www.trgdatacenters.com/resource/nvidia-h200) -[^AWS.AMAZON.COM]: [AWS p5 instance (8×H100) configuration for reference​](https://aws.amazon.com/blogs/aws/new-amazon-ec2-p5-instances-powered-by-nvidia-h100-tensor-core-gpus-for-accelerating-generative-ai-and-hpc-applications/) -[^SECUREMACHINERY.COM]: [Sizing an LLM for GPU memory](https://securemachinery.com/category/aws/) -[^US.SUNPOWER.COM]: [Solar panel power density (≈20 W/sq ft)​](https://us.sunpower.com/blog/how-much-solar-power-produced-square-foot) -[^FIBER.GOOGLE.COM]: [Google Fiber 8 Gbps service pricing in Austin​](https://fiber.google.com/austin/) -[^LATENT.SPACE]: [GPU lease market trends (H100 price drop)​](https://www.latent.space/p/gpu-bubble) -[^BIGDATASUPPLY.COM]: [Typical GPU depreciation over time​](https://bigdatasupply.com/sell-your-i-t-equipment/sell-gpu/) -[^GIGAPACKETS.COM]: [Gigapackets 10 Gbps business line pricing in Austin​](https://www.gigapackets.com/10Gigabit/texas/austin.php) \ No newline at end of file +- BOOTSTRAPPING AND SCALING: We will likely initially start with a small set of regional nodes and scale from there. There are several open questions on this: + - What regions should we start with? Most providers on the network are in North America and EU so one school of thought is to double down on those regions (since that is where customer demand has been). Another school of thought is to go for more dots on the map and therefore prioritize regions of the world where there are no providers. The final state will likely be a combination of of the two + - What happens if a user wants to join the network but there isn’t a regional control cluster available in their region? There are two potential ways we can handle this: One is to collect all such users into wait lists and have a pre-set threshold (minimum number of users) that will trigger us setting up a regional control node for that specific region. Another way (more flexible but also more complex) is to allow users to potentially become the control cluster admin if they happen to be the first user in a region (thereby bootstrapping that region for the network). \ No newline at end of file diff --git a/src/content/aeps/aep-60/aah-global-network.png b/src/content/aeps/aep-60/aah-global-network.png new file mode 100644 index 000000000..fc691e15e Binary files /dev/null and b/src/content/aeps/aep-60/aah-global-network.png differ diff --git a/src/content/aeps/aep-60/aah-regional-network.png b/src/content/aeps/aep-60/aah-regional-network.png new file mode 100644 index 000000000..b104707f5 Binary files /dev/null and b/src/content/aeps/aep-60/aah-regional-network.png differ diff --git a/src/content/aeps/aep-61/README.md b/src/content/aeps/aep-61/README.md index 25a1f4550..57ffed0d2 100644 --- a/src/content/aeps/aep-61/README.md +++ b/src/content/aeps/aep-61/README.md @@ -7,7 +7,7 @@ type: Standard category: Core created: 2025-01-30 updated: 2025-02-18 -estimated-completion: 2025-02-28 +completed: 2025-03-12 roadmap: major --- @@ -78,7 +78,7 @@ This approach has the following pros and cons: ### `x/authz` -Store will update with following prefixes: +Store will be updated with the following prefixes: - `{0x01}` - grantor prefix (remains unchanged) - `{0x03}` - grantee prefix. Format of the key is `0x03: grants count` @@ -86,4 +86,4 @@ Store will update with following prefixes: ## Copyright -All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). \ No newline at end of file +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-62/README.md b/src/content/aeps/aep-62/README.md new file mode 100644 index 000000000..82e332449 --- /dev/null +++ b/src/content/aeps/aep-62/README.md @@ -0,0 +1,38 @@ +--- +aep: 62 +title: "Provider Console - Node Manager" +author: Anil Murty (@anilmurty) Jigar Patel (@jigar-arc10) Deval Patel (devalpatel67) +status: Final +type: Standard +category: Interface +created: 2024-03-15 +updated: 2025-04-28 +completed: 2025-04-17 +resolution: https://github.com/akash-network/console/milestone/17?closed=1 +roadmap: minor +--- + + +## Motivation + +Akash providers frequently start out with a small cluster (with 1 or 2 nodes) and then expand over time. + +## Summary + +With Akash Provider Console now generally available and new and existing providers onboarding on to it, we need to add support for a key feature - which is, the ability to easily add new nodes or remove one or more existing nodes from the cluster. In addition, providing a dedicated page (and side menu item) for providers to be able to visualize things at a node level will enable providers to better manage the clusters they operate. + +## Feature Requirements + +The scope of the work under this AEP will include 3 things: +- A new "Node Management" page and side menu item that lets the provider user visualize key details of indvidual nodes of the specific provider they are connected to, including actions to add or remove nodes. +- A set of onboarding steps in the UI to add (onboard) more nodes on to the provider, starting from the "Node Management" page + +## Design + +This is the tentative design - there may be some changes to this by the time it is released + +![image](https://github.com/user-attachments/assets/f6e537ad-0813-4185-aa2d-4ce4b222ae90) +![image (14)](https://github.com/user-attachments/assets/1c5fdc14-c325-4ac1-867a-2e7e489d0f27) +![image (15)](https://github.com/user-attachments/assets/f78f1598-60dd-46b4-8ba7-dfc5470f2cd5) +![image (16)](https://github.com/user-attachments/assets/acac1956-000e-4ea9-bbf2-39ca7b8c10cf) +![image (17)](https://github.com/user-attachments/assets/a101a644-7a11-4acf-b7e9-b5ba625dacad) diff --git a/src/content/aeps/aep-63/README.md b/src/content/aeps/aep-63/README.md new file mode 100644 index 000000000..2c2a62100 --- /dev/null +++ b/src/content/aeps/aep-63/README.md @@ -0,0 +1,124 @@ +--- +aep: 63 +title: "Console API for Managed Wallet Users - v1" +author: Anil Murty (@anilmurty) Maxime Beauchamp (@baktun14) +status: Final +type: Standard +category: Interface +created: 2024-03-14 +updated: 2025-05-28 +completed: 2025-05-28 +roadmap: major +--- + + +## Motivation + +The number of Managed Wallet (Credit Card) users in Akash Console has grown significantly since launch. As these users and customers look to scale their applications they need a programmatic way to deploy and manage the lifecycle of their workloads on Akash so that they can scale up/ down in response to demand for their applications. + +## Summary + +Akash provides a programmatic way for users to deploy workloads via the [AkashJS SDK](https://github.com/akash-network/akashjs) but AkashJS is requires that the user of the API be familiar with not just crypto (API requires specifying wallets, mnemonics) but even the Cosmos SDK (requires importing cosmos specific libraries). This makes it hard if not impossible for non-crypto users to consume this API. Based on conversations with such non-crypto users it has become clear that they need a clean API with simple endpoints to call, for deployment and lease lifecycle management. + +The goal of this roadmap milestone is to implement clean API endpoints for all parts of managing the lifecycle of applications and workloads programmatically for users who pay with a credit card. + +## High Level Specification + +The specification is split into the following functional areas: +- API Endpoints +- UI for managing API keys +- Documentation of the APIs + +Progress is tracked in the [Managed Wallets API Milestone](https://github.com/akash-network/console/milestone/10) + +### API Endpoints +The following API endpoints have been identified at the time of writing this spec. As we work on onboarding customers to use the API we will either update this spec or create new roadmap items to add additional API endpoints. + +Note that while there are a lot of API endpoints listed here - the managed wallets user will only need to exercise a subset of them. The rest are listed here in the interest of full specification. + +#### API Authentication & Management + +While these endpoints will be made available to the user, we anticipate that most users will use the UI to manage their keys + +`POST /v1/users/api-keys` (create) +`GET /v1/users/api-keys` (read/ list all keys) +`GET /v1/users/api-keys/{id}` (read/ list specific API key details) +`PATCH /v1/users/api-keys/{id}` (update) +`DELETE /v1/users/api-keys/{id}` (delete) + +In addition there needs to be validation of the API keys at the middleware level so that it applies to any user specific endpoint + +Implementation/ status tracked in [Console issue #768](https://github.com/akash-network/console/issues/768) + +#### Certificate Management + +While there will be a documented API for this - it will likely be handled automagically under the hood for the managed wallet user. If it is handled for the user we may not implement the list and revoke endpoints + +`POST /v1/certificates` (create cert) +`GET /v1/certificates` (list all certs) +`DELETE /v1/certificates/{id}` (revoke cert) + +#### Deployment Creation + +`POST /v1/deployments` (create deployment based on `SDL/ YAML` and escrow `deposit`) + +Implementation tracked in [issue #767](https://github.com/akash-network/console/issues/767) + +#### Bid Selection + +`GET v1/bids/{dseq}` (get bids for a `dseq`) + +Implementation tracked in [issue #767](https://github.com/akash-network/console/issues/767) + +#### Lease Creation + +`POST /v1/leases` (create lease based on a payload that is an array of sdl, dseq, gseq, provider combinations) + +Note that Akash supports creating multiple leases from a single SDL - where each lease can be for a different service (with a different container image) and deployed to a different provider + +Implementation tracked in [issue #767](https://github.com/akash-network/console/issues/767) + +#### Listing Deployments + +`GET /v1/deployments` (list all deployments) +`GET /v1/deployments/{dseq}` (list deployment detail for a specific deployment) + +Implementation tracked in [issue 1042](https://github.com/akash-network/console/issues/1042) +and [issue #767](https://github.com/akash-network/console/issues/767) + +#### Deployment Closure + +`DELETE /v1/deployments/{dseq}` (close a specific deployment) + +Implementation tracked in [issue #767](https://github.com/akash-network/console/issues/767) + +#### Funding Deployments + +For viewing or adding funds to the escrow of a specific deployment + +`GET /v1/deployments/{dseq}` (retrieve escrow balance details) +`POST /v1/deployments/deposit/{dseq}` (accepts `deposit` amount in the body) + +implementation tracked in [issue #989](https://github.com/akash-network/console/issues/989) and [issue #990](https://github.com/akash-network/console/issues/990) + +#### Funding Account + +This is for the customer to purchase more credits and fund their account - we will decide if we offer this or not based on customer requests but it will likely directly go to the Stripe Checkout API https://docs.stripe.com/api/checkout/sessions/object + +### UI for Managing API Keys + +The user will be able to get to an API key management page from their user profile drop down + +![api-keys dropdown menu](api-keys-menu.png) + +The API key management page will let the user view all keys, create a new one or delete and existing one + +![api-keys management page](api-key-management.png) + +### Documentation + +Documentation will be added in three places + +- In the [Swagger docs](https://console-api.akash.network/v1/swagger) for Console API (which is linked from the Console side nav bar) +- In [docs.akash.network](https://akash.network/docs/) +- In the Github [Wiki page for Console](https://github.com/akash-network/console/wiki) (and linked from the Github [Readme for Console](https://github.com/akash-network/console/blob/main/README.md)) diff --git a/src/content/aeps/aep-63/api-key-management.png b/src/content/aeps/aep-63/api-key-management.png new file mode 100644 index 000000000..cfdab12fe Binary files /dev/null and b/src/content/aeps/aep-63/api-key-management.png differ diff --git a/src/content/aeps/aep-63/api-keys-menu.png b/src/content/aeps/aep-63/api-keys-menu.png new file mode 100644 index 000000000..70058f380 Binary files /dev/null and b/src/content/aeps/aep-63/api-keys-menu.png differ diff --git a/src/content/aeps/aep-64/README.md b/src/content/aeps/aep-64/README.md new file mode 100644 index 000000000..4b4f29aee --- /dev/null +++ b/src/content/aeps/aep-64/README.md @@ -0,0 +1,714 @@ +--- +aep: 64 +title: "JWT Authentication for Provider API" +author: Artur Troian (@troian) +status: Final +type: Standard +category: Core +created: 2025-04-03 +updated: 2025-07-30 +completed: 2025-10-28 +roadmap: major +--- + +## Abstract + +This AEP proposes implementing [JWT (JSON Web Token)](https://datatracker.ietf.org/doc/html/rfc7519) authentication for the Akash Provider API. This enhancement aims to improve the reliability of client API communication with leases during blockchain maintenance periods and provide more granular access control capabilities. + +## Motivation + +The current mTLS authentication mechanism, while secure, presents several limitations: + +1. **Blockchain Dependency**: When the blockchain acts as the root of trust, API clients cannot maintain communication with leases during blockchain maintenance windows. +2. **Limited Access Control**: The current certificate-based system grants access to all leases and features, making it challenging to implement granular access controls. +3. **Certificate Management Complexity**: Implementing and maintaining granular access via certificates is complex for clients. + +JWT authentication offers several advantages: +- Widely adopted and well-understood authentication mechanism +- Enables granular access control +- Allows for more flexible token management +- Reduces dependency on blockchain availability + +## Technical Details + +### Key Concepts + +1. **Asymmetric Key Usage**: + - Akash uses ECDSA with secp256k1 curve for wallet operations + - This AEP focuses on the signing capabilities for JWT generation + +2. **Client-Issued JWT**: + - Unlike conventional JWT implementations where servers issue tokens, clients will issue JWTs + - This approach is necessary because: + - A single wallet may have multiple simultaneous leases across different providers + - Only the lease owner can create granular access JWTs + - The wallet's public key is available on the blockchain for provider validation + +3. **Certificate Management**: + - Supports standalone CA certificates + - Compatible with Let's Encrypt certificates + - Eliminates the need for custom TLS handshake handlers on the client side + +### Implementation Guidelines + +1. **Token Lifetime**: + - JWTs should be short-lived due to revocation challenges + - Recommended maximum lifetime: 15 minutes + - Implementation-specific lifetime configurations are allowed + +2. **Provider Implementation**: + - Providers must query and cache lease owner public keys from the blockchain + - Cache must be persistent across service restarts and updates + - Cache invalidation strategies should be implemented + +## JWT Specification + +### Signing methods + +Only ES256K with secp256k1 curve is supported + +### Schema Definition + +```json +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://raw.githubusercontent.com/akash-network/akash-api/refs/heads/main/specs/jwt-schema.json", + "title": "Akash JWT Schema", + "description": "JSON Schema for JWT used in the Akash Provider API.", + "type": "object", + "additionalProperties": false, + "required": [ + "iss", + "iat", + "exp", + "nbf", + "version", + "leases" + ], + "properties": { + "iss": { + "type": "string", + "pattern": "^akash1[a-z0-9]{38}$", + "description": "Akash address of the lease(s) owner, e.g., akash1abcd... (44 characters)" + }, + "iat": { + "type": "integer", + "minimum": 0, + "description": "Token issuance timestamp as Unix time (seconds since 1970-01-01T00:00:00Z). Should be <= exp and >= nbf." + }, + "nbf": { + "type": "integer", + "minimum": 0, + "description": "Not valid before timestamp as Unix time (seconds since 1970-01-01T00:00:00Z). Should be <= iat." + }, + "exp": { + "type": "integer", + "minimum": 0, + "description": "Expiration timestamp as Unix time (seconds since 1970-01-01T00:00:00Z). Should be >= iat." + }, + "jti": { + "type": "string", + "minLength": 1, + "description": "Unique identifier for the JWT, used to prevent token reuse." + }, + "version": { + "type": "string", + "enum": [ + "v1" + ], + "description": "Version of the JWT specification (currently fixed at v1)." + }, + "leases": { + "type": "object", + "additionalProperties": false, + "required": [ + "access" + ], + "properties": { + "access": { + "type": "string", + "enum": [ + "full", + "granular" + ], + "description": "Access level for the lease: 'full' for unrestricted access to all actions, 'granular' for provider-specific permissions." + }, + "scope": { + "type": "array", + "minItems": 1, + "uniqueItems": true, + "items": { + "type": "string", + "enum": [ + "send-manifest", + "get-manifest", + "logs", + "shell", + "events", + "status", + "restart", + "hostname-migrate", + "ip-migrate" + ] + }, + "description": "Global list of permitted actions across all owned leases (no duplicates). Applies when access is 'full'." + }, + "permissions": { + "type": "array", + "description": "Required if leases.access is 'granular'; defines provider-specific permissions.", + "minItems": 1, + "items": { + "type": "object", + "additionalProperties": false, + "required": [ + "provider", + "access" + ], + "properties": { + "provider": { + "type": "string", + "pattern": "^akash1[a-z0-9]{38}$", + "description": "Provider address, e.g., akash1xyz... (44 characters)." + }, + "access": { + "type": "string", + "enum": [ + "full", + "scoped", + "granular" + ], + "description": "Provider-level access: 'full' for all actions, 'scoped' for specific actions across all provider leases, 'granular' for deployment-specific actions." + }, + "scope": { + "type": "array", + "minItems": 1, + "uniqueItems": true, + "items": { + "type": "string", + "enum": [ + "send-manifest", + "get-manifest", + "logs", + "shell", + "events", + "status", + "restart", + "hostname-migrate", + "ip-migrate" + ] + }, + "description": "Provider-level list of permitted actions for 'scoped' access (no duplicates)." + }, + "deployments": { + "type": "array", + "minItems": 1, + "items": { + "type": "object", + "additionalProperties": false, + "required": [ + "dseq", + "scope" + ], + "properties": { + "dseq": { + "type": "integer", + "minimum": 1, + "description": "Deployment sequence number." + }, + "scope": { + "type": "array", + "minItems": 1, + "uniqueItems": true, + "items": { + "type": "string", + "enum": [ + "send-manifest", + "get-manifest", + "logs", + "shell", + "events", + "status", + "restart", + "hostname-migrate", + "ip-migrate" + ] + }, + "description": "Deployment-level list of permitted actions (no duplicates)." + }, + "gseq": { + "type": "integer", + "minimum": 0, + "description": "Group sequence number (requires dseq)." + }, + "oseq": { + "type": "integer", + "minimum": 0, + "description": "Order sequence number (requires dseq and gseq)." + }, + "services": { + "type": "array", + "minItems": 1, + "items": { + "type": "string", + "minLength": 1 + }, + "description": "List of service names (requires dseq)." + } + }, + "dependencies": { + "gseq": [ + "dseq" + ], + "oseq": [ + "dseq", + "gseq" + ], + "services": [ + "dseq" + ] + } + } + } + }, + "allOf": [ + { + "if": { + "properties": { + "access": { + "const": "scoped" + } + } + }, + "then": { + "required": [ + "scope" + ], + "properties": { + "scope": { + "minItems": 1 + }, + "deployments": false + } + } + }, + { + "if": { + "properties": { + "access": { + "const": "granular" + } + } + }, + "then": { + "required": [ + "deployments" + ], + "properties": { + "scope": false + } + } + }, + { + "if": { + "properties": { + "access": { + "const": "full" + } + } + }, + "then": { + "properties": { + "scope": false, + "deployments": false + } + } + } + ] + } + } + }, + "allOf": [ + { + "if": { + "properties": { + "access": { + "const": "full" + } + } + }, + "then": { + "required": [ + "scope" + ], + "properties": { + "permissions": false + } + } + }, + { + "if": { + "properties": { + "access": { + "const": "granular" + } + }, + "required": [ + "access" + ] + }, + "then": { + "required": [ + "permissions" + ], + "properties": { + "scope": false + } + } + } + ] + } + }, + "allOf": [ + { + "if": { + "properties": { + "leases": { + "properties": { + "access": { + "const": "granular" + } + }, + "required": [ + "access" + ] + } + }, + "required": [ + "leases" + ] + }, + "then": { + "properties": { + "leases": { + "required": [ + "permissions" + ], + "properties": { + "scope": false + } + } + } + } + }, + { + "if": { + "properties": { + "leases": { + "properties": { + "permissions": { + "type": "array", + "minItems": 1 + } + }, + "required": [ + "permissions" + ] + } + }, + "required": [ + "leases" + ] + }, + "then": { + "properties": { + "leases": { + "properties": { + "access": { + "const": "granular" + } + }, + "required": [ + "access" + ] + } + } + } + } + ] +} +``` + +### Field Descriptions + +1. **Required Fields**: + - `iss`: Akash address of the lease owner + - `iat`: Token creation timestamp (NumericDate) + - `exp`: Token expiration timestamp (NumericDate) + - `version`: JWT specification version (must be "v1") + - `leases` : + - `access`: Access level ("full" or "granular") + +2. **Optional Fields for leases**: + - `permissions`: Array of granular access permissions + - `provider`: Provider address (required) + - `access`: Access level ("full", "scoped" or "granular") + - `scope`: List of permitted actions (required) + - `deployments` + - `scope`: List of permitted actions (required) + - `dseq`: Deployment sequence number (optional) + - `gseq`: Group sequence number (requires dseq) + - `oseq`: Order sequence number (requires dseq) + - `services`: List of service names (requires dseq) + +### Examples + +#### Scoped access for specific deployment +```json +{ + "iss": "akash1...", + "version": "v1", + "iat": "1744029137", + "exp": "1744029139", + "nbf": "1744029138", + "jti": "", + "leases" : { + "access": "granular", + "permissions": [ + { + "provider": "akash1...", + "access": "granular", + "deployments": [ + { + "scope": ["logs", "shell"], + "dseq": 123456, + "gseq": 1, + "oseq": 1, + "services": ["web", "api"] + } + ] + } + ] + } +} +``` + +#### Full access to all tenant's workloads within specific provider +```json +{ + "iss": "akash1...", + "version": "v1", + "iat": "1744029137", + "exp": "1744029139", + "nbf": "1744029138", + "jti": "", + "leases" : { + "access": "granular", + "permissions": [ + { + "provider": "akash1...", + "access": "full" + } + ] + } +} +``` + +#### Scoped access to only logs for all tenant's workloads within specific provider +```json +{ + "iss": "akash1...", + "version": "v1", + "iat": "1744029137", + "exp": "1744029139", + "nbf": "1744029138", + "jti": "", + "leases" : { + "access": "granular", + "permissions": [ + { + "provider": "akash1...", + "access": "scoped", + "scope": [ + "logs" + ] + } + ] + } +} +``` + +## Implementation Resources + +### JWT Authentication with Let's Encrypt and mTLS Fallback + +#### Sequence Diagram + +```mermaid +sequenceDiagram + participant Client as Client/Tenant + participant Provider as Provider API Gateway + participant Blockchain as Blockchain + participant LE as Let's Encrypt + participant Cache as Certificate Cache + + Note over Client, Cache: JWT Authentication Flow with Certificate Management + + %% Initial Setup Phase + rect rgb(240, 248, 255) + Note over Client, Provider: 1. Initial Setup Phase + Client->>Blockchain: Publish public key + Provider->>Blockchain: Publish public key + Provider->>LE: Request certificate for gateway domain + LE-->>Provider: Let's Encrypt certificate + Provider->>Cache: Store certificates (LE + mTLS) + end + + %% Authentication Request Phase + rect rgb(255, 248, 240) + Note over Client, Provider: 2. Authentication Request Phase + Client->>Client: Generate JWT with private key + Client->>Provider: Send request with JWT token + end + + %% Certificate Selection Phase + rect rgb(248, 255, 248) + Note over Client, Provider: 3. Certificate Selection Phase + alt SNI starts with provider gateway domain + Provider->>Cache: Check Let's Encrypt certificate + alt Let's Encrypt certificate available & valid + Provider-->>Client: Serve Let's Encrypt certificate + else Let's Encrypt certificate unavailable/expired + Provider->>Cache: Get mTLS certificate + Provider-->>Client: Serve mTLS certificate (fallback) + end + else No SNI or SNI starts with mTLS prefix + Provider->>Cache: Get mTLS certificate + Provider-->>Client: Serve mTLS certificate + end + end + + %% TLS Handshake Phase + rect rgb(255, 240, 255) + Note over Client, Provider: 4. TLS Handshake Phase + Client->>Provider: TLS handshake with selected certificate + Provider-->>Client: TLS connection established + end + + %% JWT Validation Phase + rect rgb(240, 255, 240) + Note over Client, Provider: 5. JWT Validation Phase + Provider->>Blockchain: Fetch client's public key + Provider->>Cache: Cache public key for future use + Provider->>Provider: Validate JWT signature with public key + Provider->>Provider: Validate JWT permissions/claims + + alt JWT valid + Provider-->>Client: Authentication successful + Provider->>Provider: Process authenticated request + else JWT invalid + Provider-->>Client: Authentication failed (401/403) + end + end + + %% Certificate Renewal Phase + rect rgb(255, 255, 240) + Note over Provider, LE: 6. Certificate Renewal (Background) + loop Periodically + Provider->>LE: Check certificate expiry + alt Certificate expiring soon + Provider->>LE: Request certificate renewal + LE-->>Provider: New Let's Encrypt certificate + Provider->>Cache: Update certificate cache + end + end + end +``` + +##### Key Components + +###### 1. Certificate Management +- **Let's Encrypt Certificate**: Primary certificate for production use +- **mTLS Certificate**: Fallback certificate for testing and when LE is unavailable +- **Certificate Cache**: Stores both certificates with availability status + +###### 2. SNI-Based Certificate Selection +- **Gateway Domain SNI**: Routes to Let's Encrypt certificate +- **mTLS Prefix SNI**: Routes to mTLS certificate +- **No SNI**: Defaults to mTLS certificate + +###### 3. Fallback Strategy +- **Primary**: Let's Encrypt certificate (when available and valid) +- **Fallback**: mTLS certificate (always available) +- **Automatic**: Seamless fallback without client intervention + +###### 4. JWT Validation Process +1. Client generates JWT with private key +2. Provider fetches client's public key from blockchain +3. Provider validates JWT signature using public key +4. Provider validates JWT permissions/claims +5. Authentication succeeds or fails based on validation + +##### Benefits + +- **Backward Compatibility**: Supports both JWT and mTLS clients +- **Production Ready**: Let's Encrypt certificates for production use +- **Testing Friendly**: mTLS certificates for development/testing +- **High Availability**: Automatic fallback ensures service continuity +- **Simplified Implementation**: Clients only need to implement JWT, not custom TLS handshake + + +### Recommended Libraries + +Any JWT implemenation with ability to implement custom signer/verifiers. + +### Security Considerations + +1. **Token Lifetime**: + - Keep tokens short-lived (max 15 minutes) + - Implement proper token validation + - Consider implementing token blacklisting for critical operations + +2. **Key Management**: + - Secure storage of private keys + - Regular key rotation + - Proper key backup procedures + +3. **Provider Implementation**: + - Implement proper public key caching + - Regular cache invalidation + - Secure storage of cached keys + +## Migration Guide + +1. **For Providers**: + - Implement JWT validation + - Set up public key caching + - Update API endpoints to support JWT authentication + - Maintain backward compatibility with mTLS + +2. **For Clients**: + - Implement JWT generation + - Update authentication logic + - Implement proper key management + - Update API client libraries + +## Future Considerations + +1. **Token Revocation**: + - Implement a token revocation mechanism + - Consider using a distributed revocation list + - Explore blockchain-based revocation + +2. **Enhanced Security**: + - Add support for additional signing algorithms + - Implement token encryption + - Add support for refresh tokens + +3. **Performance Optimization**: + - Optimize public key caching + - Implement efficient token validation + - Add support for token batching + +## Copyright + +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-65/README.md b/src/content/aeps/aep-65/README.md new file mode 100644 index 000000000..4880fa98c --- /dev/null +++ b/src/content/aeps/aep-65/README.md @@ -0,0 +1,138 @@ +--- +aep: 65 +title: "Confidential Computing" +author: Anil Murty (@anilmurty) +status: Draft +type: Standard +category: Core +created: 2025-05-14 +updated: 2025-05-14 +estimated-completion: 2026-7-31 +roadmap: major +--- + +## Motivation +Public clouds (like AWS, Azure and GCP) support Confindential Computing because some customers request this before they agree to migrate workloads from their own DCs to public cloud infrastructure. While a vast majority of users don't ask the public clouds for this (and just blindly "trust" them) this is likely to become a challenge for Akash's growth particularly because infractstructure on Akash is owned by 10s if not 100s of independent providers. + +## Background +Confidential Computing (CC) protects sensitive data while it's being used, by running computations inside a secure, isolated hardware environment—often a Trusted Execution Environment (TEE) or TEE “Enclave”—so that even cloud providers or system administrators can't access it. It is effectively the equivalent of what Encryption and TLS do for code/ data at rest and in transit, respectively. + +![trusted execution](trusted-execution.png) + +The need for trusted execution came about as the mainstream tech industry transitioned from running their own datacenters to running in public clouds that are essentially datacenters owned by some other company (like AWS, Azure and GCP) and was led by verticals that were extra sensitive to data privacy (like healthcare, financial and federal use cases). + +Two market dynamics drive the need for Akash to accelerate its confidential computing roadmap: + 1. The transition from centralized public clouds to decentralized public clouds increases the need to secure data and code at run-time since the infrastructure is now owned by 10s if not 100s of different providers. + + 2. The growth in AI workloads for not just custom AI model hosting (while preserving IP) but also for privacy sensitive use cases like healthcare. + + +At a high level trusted execution is achieved by doing two things + +- **Attestation** - which is essentially verifying that the environment (aka “Trusted Enclave”) where the code will be run can be trusted - this is typically achieved using “hardware verification” + +- **Sealing/ Unsealing** - which is essentially the act of storing data in and retrieving data out of the TEE Enclave + + +## Prerequisites +The following are prerequisites to allow Akash providers to support trusted execution: + +### TEE Capable Hardware (CPU & GPU) +Not all hardware is TEE capable. Here is the list of TEE capable CPUs and GPUs at the time of writing this spec: + +#### TEE Capable CPUs + +| Vendor | Feature | Required Models | +|--------|--------------------------------------|---------------------------------------------------------------------------------| +| Intel | TDX (Trust Domain Extensions) | Intel Xeon 5th Gen CPUs like “Sapphire Rapids” (with TDX BIOS support) | +| Intel | SGX (Software Guard Extensions) | Intel Xeon E3, Xeon D, and select 10th–11th Gen Core CPUs (now deprecated by Intel) | +| AMD | SEV | AMD EPYC “Rome” (7002 series) | +| AMD | SEV-ES / SNP | AMD EPYC “Milan” (7003) and “Genoa” (9004) series | + +#### TEE Capable GPUs + +| Vendor | Feature | Required Models | +|-------------|-------------|---------------------------------------------------------------------------------| +| NVIDIA | NVTrust | NVIDIA H100 or H200 (Hopper architecture) with CC-on mode | +| AMD/Intel | _None yet_ | No current support for GPU-based TEEs (CPU-side only) | + +In summary, Providers must use the following hardware: +- Intel CPUs with TDX (e.g., Xeon Sapphire Rapids) +- AMD CPUs with SEV-SNP (e.g., EPYC Milan/Genoa) +- NVIDIA H100 or H200 GPUs (for NVTrust support) + +#### TEE Enabled Host Kernel & BIOS configuration + +BIOS configuration changes need to be made to enable TDX/ SGX (for intel) and SEV (for AMD). These typically also require a certain minimum version of the Linux Kernel to be used. + +##### Intel + +To Enable memory encryption, TDX and SGX for Intel, consult [this document](https://github.com/canonical/tdx/blob/1.2/README.md ). + +##### AMD + +To Enable AMD SEV, consult [this document](https://github.com/AMDESE/AMDSEV/blob/master/README.md). + + +#### Access to Device Nodes for Attestation +In order to perform attestation (i.e., fetch measurements and generate quotes), the container must access specific device nodes like: +- /dev/tdx-attest ([Intel TDX](https://docs.kernel.org/virt/coco/tdx-guest.html)) +- /dev/sev-guest ([AMD SEV-SNP](https://docs.kernel.org/virt/coco/sev-guest.html)) +- /dev/nv_attestation (NVIDIA H100 CC-on mode) + +There are three main ways to allow containers access to these device nodes: + +##### Privileged Containers + +This involves allowing the container to run with --privileged flag or securityContext.privileged: true, which gives full access to all host devices. + +This would be the simplest from an implementation standpoint as it would provide access to /dev/* nodes without requiring major orchestration changes + +The Cons with doing this is that it poses a major security risk. Giving a tenant full host access including access to other containers’ processes, sockets or secrets which would violate tenant isolation requirements for Akash. FOR THIS REASON THIS IS A NOT EVEN AN OPTION. + +##### Virtual Machines (Full Fledged VMs) + +Full VMs would offer the strongest tenant isolation and flexibility in terms of OS, runtime and workload control and potentially unlock new use cases for Akash + +The downsides of this approach are: +Requires figuring out how to orchestrate VMs with Kubernetes (possibly using KubeVirt) or figuring out an entirely different orchestratorKubeVirt +Has a performance overhead +WIll also require implementing tenant-side VM image management which is harder than container packaging/ management + +For this reason THIS IS ALSO LIKELY NOT THE BEST OPTION (at least not if we’re looking to get CC/ TEE support to market sooner than later) + + +##### MicroVMs + +In this case, each container runs inside its own lightweight VM and the TEE (TDX/ SGX or SEV) protects the VM’s memory and execution state. This can likely be implemented using Kata Containers (container runtime) that uses QEMU (emulator) and KVM (hypervisor) underneath. + +The benefits are: + + 1. No need for a separate orchestrator. Kata Containers support the OCI container format and the Kubernetes CRI - so they should in theory work alongside regular docker containers + 2. Tenants can continue to use the containerized workflow + 3. Maintains performance of the existing (container based) deployment for the most part + 4. Provides better isolation than the current container implementation since each container runs in a dedicated kernel with network, memory and IO isolation + +This would (at least in theory) achieve all the objectives for the near term while keeping the implementation complexity lower than full blown VMs. For this reason - THIS IS THE RECOMMENDED SOLUTION. + +## Ideal User Experience +The ideal user experience should be one where Akash users (aka “tenants”) should barely notice any difference in the deployment experience, relative to what it is for regular (non confidential) deployments. When requesting bids, they should be able to select an option (in the UI, the CLI or API) and be able to get bids only from providers that are capable of executing the tenant container within a secure enclave. +And once the deployment is done (container is running), the tenants should be able to make a set of simple, high level API calls from within the container to perform attestation, apply a policy and then seal an unseal subsequent requests for the duration of the container’s life. + +In order to achieve that, the following need to be done (this assumed that the prerequisites from the previous section are satisfied) + +1. Changes to provider attributes to allow providers to advertise that they are TEE/ CCE capable. + +2. Changes to the SDL to allow tenants to specify that they need a TEE/ CCE capable provider + +3. An API or SDK that wraps the vendor specific SDKs and provides an easy to use interface for attestation + +4. An API or SDK that wraps vendor specific SDKs and provides an easy way to perform sealing and unsealing. + +## High Level Roadmap + +Based on the above, the roadmap for achieving Confidential Computing on Akash Network can be broken down into: + +- [AEP-29](https://akash.network/roadmap/aep-29/): Hardware Verification (Support for Attestation) +- [AEP-12](https://akash.network/roadmap/aep-12/): TEE Support (Support for sealing/ Unsealing) +- AEP-xx: Confidential Computing for Users (API/ SDK + SDL changes + provider attributes) - this could potentially be pulled into this document itself diff --git a/src/content/aeps/aep-65/trusted-execution.png b/src/content/aeps/aep-65/trusted-execution.png new file mode 100644 index 000000000..7fde4bbf2 Binary files /dev/null and b/src/content/aeps/aep-65/trusted-execution.png differ diff --git a/src/content/aeps/aep-66/README.md b/src/content/aeps/aep-66/README.md new file mode 100644 index 000000000..11ab693ce --- /dev/null +++ b/src/content/aeps/aep-66/README.md @@ -0,0 +1,99 @@ +--- +aep: 66 +title: "Custom Domain Certificates" +author: Joao Luna (@cloud-j-luna) +status: Draft +type: Standard +category: Core +created: 2025-05-13 +updated: 2025-07-07 +estimated-completion: 2026-06-30 +roadmap: major +--- + + ## Abstract + + This proposal introduces a mechanism for Akash Network tenant workloads to obtain SSL/TLS certificates for their configured custom domains, enabling secure HTTPS access to deployments. + + ## Motivation + + Currently, Akash Network deployments are accessible via the default ingress subdomain (e.g., *.ingress.akash.pub). + To enhance the security and accessbility of deployments, tenants should have the ability to use custom domains with SSL/TLS certificates without relying on third party solutions such as Cloudflare. + + ## Technical Details + + ### Certificate Management (cert-manager) + - `cert-manager` is a Kubernetes controller used to automate the management and issuance of TLS certificates. + - It supports Let's Encrypt and other certificate authorities. + - On Akash, `cert-manager` runs as part of the provider infrastructure and handles certificate issuance for ingress resources. + - It uses HTTP-01 challenges to validate domain ownership. + - Users do not directly interact with cert-manager, but it powers the automatic issuance of certs based on deployment configuration and DNS records. + + ### Ingress Controllers + - Akash uses Kubernetes Ingress controllers (e.g., NGINX Ingress) to route external HTTP(S) traffic to tenant workloads. + - Ingress resources define rules for routing and TLS termination. + - The Akash provider manages ingress creation based on the deploy.yml service definitions (expose section). + - HTTPS routing is enabled by specifying ports 443 and a custom domain in the manifest. + + ### Deployment Manifest + - The manifest file allows defining services, ports and accepted domains. + - To use a custom domain: + ``` + expose: + - port: 80 + to: + - global: true + - port: 443 + to: + - global: true + accept: + - "www.example.com" + ``` + - This instructs the provider to create an ingress rule and attempt TLS certificate issuance for the domain. + + ### DNS Configuration + - DNS setup is critical for domain validation and traffic routing. + - Tenants must create a CNAME record pointing to their deployment’s ingress endpoint. + - Example: + - `www.example.com` -> `deployment123.ingress.provider.akash.network` + - DNS propagation must complete before certificate issuance via Let's Encrypt can succeed. + + ### Certificate Lifecycle + - Certificates are automatically requested, issued, and renewed via `cert-manager`. + - Tenants do not manually manage TLS certs. + - Failure to configure DNS correctly will prevent certificate issuance and may fall back to untrusted/self-signed certs. + + + + ## Implementation + +An implementation leveraging `cert-manager` would simplify the whole solution by simply configuring the ingress with specific annotations that would trigger certificate issuing. A TLS configuration would also need to be added to the Ingress instance created by the hostname operator pointing to the TLS secret with the accepted domains. + +`cert-manager` watches Ingress resources across the Akash Provider cluster. If it observes an Ingress with annotations related to certificate issuing, it will ensure a Certificate resource with the name provided in the `tls.secretName` field and configured as described on the Ingress exists in the deployment namespace. An example Ingress: +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + annotations: + cert-manager.io/cluster-issuer: nameOfClusterIssuer + name: myIngress + namespace: myIngress +spec: + rules: + - host: example.com + http: + paths: + - pathType: Prefix + path: / + backend: + service: + name: myservice + port: + number: 80 + tls: + - hosts: + - custom.domain.my + secretName: myingress-cert +``` + +With this, user workloads will be provided a valid and automatically managed certificate for their custom domains. diff --git a/src/content/aeps/aep-67/README.md b/src/content/aeps/aep-67/README.md new file mode 100644 index 000000000..0e0e0d2e3 --- /dev/null +++ b/src/content/aeps/aep-67/README.md @@ -0,0 +1,56 @@ +--- +aep: 67 +title: "Console Bid PreCheck" +author: Anil Murty (@anilmurty) +status: Final +type: Standard +category: Core +created: 2025-05-16 +updated: 2025-07-30 +estimated-completion: 2026-04-30 +roadmap: major +--- + +## Motivation + +Users often see plenty of available GPUs on the pricing page but fail to receive any bids for their deployment. This causes users to think that the service is broken and likely give up on investigating further. Providing users guidance on why this is may be happening will go a long way in improving adoption. + +## Background +There are situations where Console users hit the GPU pricing page (on the website) or the providers page (in console), see that there are enough "available" GPUs of the desired model, proceed to deploy via console, only to NOT get ANY bids for their deployment. This can happen due to the following primary reasons: + +- While there may be enough "available" GPUs in aggregate (across multiple providers), there may not be enough GPUs on a single provider +- While there may be enough GPUs on a single provider, there aren't enough (to fulfill the gpu count in the user SDL) on a single node of the provider. This can happen if past small requests (1-2 GPUs per deployment) happened to get scheduled across different nodes of the provider, leaving the provider "fragmented" in terms of available GPUs. +- While there may be enough GPUs on a single node to satisfy the gpu count, the specific node may not have enough other (non-GPU) resources available to satisfy all the resource requirements outlined in the compute profile. We have sometimes seen this happen when a provider's CPU count gets maxed out (90%) with work loads while they have little usage of GPUs. + +Users need some guidance on whether their SDL needs to be "adjusted" before they proceed with "create deployment" and if so, provide specific guidance on what needs to be changed in order to increase likelihood of bid being received + +## Proposed Solution + +One possible way to address this is to implement what we call a "Bid PreCheck" feature. The Pre-check feature should ideally check if the resources and pricing being requested by the user in the SDL will result in any bids or not, without them actually creating an on-chain transaction. + +### Implementation + +Ideally if should be a service on the providers that can be queried with resources as input and returns whether the provider will bid or not -- but this doesn't exist today. In the absence of that we will need to make use of inventory APIs + +The way this would work is, when the user clicks on deploy and lands on either the SDL builder or the YAML editor page we would: + +1. Parse the SDL to extract out the GPU Count, GPU filter (present or not), CPU count, memory size, storage size and pricing limit (uakt) +2. Query the providers on the network to determine + - Are there providers with those specific GPUs. If there are none then return 0 for "expected bids" and recommend to the user that they "remove the GPU filter" + - If there are providers with those GPUs then check if they have the requested count on a single node. If there are none, then return 0 and recommend to the user that they "reduce the number of GPUs requested" + - If there are enough GPUs on a node but not enough CPUs (to meet the requested CPU count) then return 0 and recommend to the user that they "reduce the CPU count") + - If there are enough GPUs and CPUs on a single node but not enough memory or storage, then return 0 and recommend reducing those. + - If there is a non zero value of providers/ nodes with requested resources then indicate that non-zero number and recommend the full list of things the user can do to increase that number + +Also we should limit this function only to deployments created by users who are not trial users because trial users in general have limited providers that are likely to be fully used + +1. For the YAML editor page, we split the frame into 2 halves. Display the YAML editor in the left half and in the right half, display the results of the pre-check + +2. For the SDL builder page, we do essentially the same as 1 above except we would first + - Need to move the form fields that exist today on the right half of the page to be in a single column + - Left justify that whole column + + +This is a rough/ initial/ tentative design that will likely be changed/ improved when we implement the feature + +![Bid-Precheck](bid-precheck-screen.png) diff --git a/src/content/aeps/aep-67/bid-precheck-screen.png b/src/content/aeps/aep-67/bid-precheck-screen.png new file mode 100644 index 000000000..2732e778c Binary files /dev/null and b/src/content/aeps/aep-67/bid-precheck-screen.png differ diff --git a/src/content/aeps/aep-68/README.md b/src/content/aeps/aep-68/README.md new file mode 100644 index 000000000..be77fa7b0 --- /dev/null +++ b/src/content/aeps/aep-68/README.md @@ -0,0 +1,54 @@ +--- +aep: 68 +title: "Console - Billing & Usage" +author: Anil Murty (@anilmurty) +status: Draft +type: Standard +category: Interface +created: 2025-05-20 +updated: 2025-07-31 +completed: 2025-07-31 +resolution: https://github.com/akash-network/console/milestone/20 +roadmap: major +--- + +## Motivation + +As number of credit card users in Akash Console grows, a common request we here is being able to view usage and billing info + +## Background + +Akash Console support two payment options - Crypto Wallet based and Credit Card based. For Crypto Wallets it is easy enough for the users to see when they used tokens for deployments with a blockchain scan tool like mintscan. For credit card users there is no such tracking avaialble. We do get the invoice data from Stripe that we should be able to pass back to the user. + +Separately, users sometimes also may want to know how much of their funds are being used and for what - similar to the AWS (or othre cloud) billing pages where there is separation by services. In our case since we do not have managed services, showing some information about spend by provider or GPU models might be nice to have. + +## Proposed Solution + +A new page and submenu in the user Account Settings page called "Billing & Usage" that shows users information about billing and usage related things. The below elements are placeholder to be refined in eng and design conversations. + +- Tabular Data for + - Stripe Transactions + - Date + - Transaction Type (purchase/ refund) + - Payment Method + - Amount + - Status (suceeded or failed) + - Link to download receipt + - Daily Usage + - Date + - Resources Leased + - Amount Spent + +- Charts that show: + - Cumulative Credit purchase over timne + - Account balance over time + - Spend (in terms of compute costs for deployment) over time + +#### Tenative Design Mocks + +These are just placeholders for now to provide general direction and the final version will be different (see resolution link or console.akash.network for final UI/ UX) + +![Stripe Transactions](stripe-transactions.png) +![Daily Usage](daily-usage.png) +![Usage Analytics](usage-analytics.png) +![Monthly Spend](monthly-spend.png) \ No newline at end of file diff --git a/src/content/aeps/aep-68/daily-usage.png b/src/content/aeps/aep-68/daily-usage.png new file mode 100644 index 000000000..8b8daa4f0 Binary files /dev/null and b/src/content/aeps/aep-68/daily-usage.png differ diff --git a/src/content/aeps/aep-68/monthly-spend.png b/src/content/aeps/aep-68/monthly-spend.png new file mode 100644 index 000000000..179931b12 Binary files /dev/null and b/src/content/aeps/aep-68/monthly-spend.png differ diff --git a/src/content/aeps/aep-68/stripe-transactions.png b/src/content/aeps/aep-68/stripe-transactions.png new file mode 100644 index 000000000..2c1a279f9 Binary files /dev/null and b/src/content/aeps/aep-68/stripe-transactions.png differ diff --git a/src/content/aeps/aep-68/usage-analytics.png b/src/content/aeps/aep-68/usage-analytics.png new file mode 100644 index 000000000..2ecac9cad Binary files /dev/null and b/src/content/aeps/aep-68/usage-analytics.png differ diff --git a/src/content/aeps/aep-69/README.md b/src/content/aeps/aep-69/README.md new file mode 100644 index 000000000..8eb3049cb --- /dev/null +++ b/src/content/aeps/aep-69/README.md @@ -0,0 +1,52 @@ +--- +aep: 69 +title: "Provider Console API - v1" +author: Anil Murty (@anilmurty) Jigar Patel (@jigar-arc10) Deval Patel (devalpatel67) +status: Draft +type: Standard +category: Interface +created: 2025-05-22 +updated: 2025-07-30 +completed: 2025-07-25 +resolution: https://github.com/akash-network/console/milestone/22 +roadmap: minor +--- + +## Motivation + +GPU providers using provider Console need to pull data into their own dashboards for financial and other reporting + +## Summary + +Provider Console currently displays the follownig stats in the dashboard: + +- Total (cumulative) revenue +- Daily earnings (most recent 24 hours) + +Providers require more granular, structured access to: + +- Daily, weekly, and monthly revenue/utilization metrics +- Net revenue/earnings after Akash fees (take rate) + +While showing these additional things in the UI would also be great - Providers that we have spoken with, typically want to pull this data via an API, ir order to: + +- Integrate with their internal dashboards +- Geneate financing reports for their stakeholders +- Automated revenue tracking and forecasting + +## Proposed Solution + +Offer a set of APIs to provide revenue and GPU/resources utilization metrics to providers through the provider console backend. + +### Scope of work + +- Expose revenue and utilization data via authenticated REST API endpoints +- Support filtering by date range (daily, weekly, monthly) +- Create functions in indexer to retrieve data and create internal API endpoints in deploy-web to expose it through provider-console-backend +- Introduce basic rate-limit to avoid abuse +- Make any necessary changes to indexer to save desired data +- Secure API with provider-specific authentication (likely using JWT Authentication) + +### API Spec + +TBD (will be added soon) \ No newline at end of file diff --git a/src/content/aeps/aep-7/README.md b/src/content/aeps/aep-7/README.md index 238d8f512..0d4e8cf8b 100644 --- a/src/content/aeps/aep-7/README.md +++ b/src/content/aeps/aep-7/README.md @@ -1,7 +1,7 @@ --- aep: 7 title: "Incentivized Testnet 1: Akashian Challenge Phase 1" -author: Greg Osuri (@gosuri), Chris Remus (@chris-remus) +author: Greg Osuri (@gosuri) Chris Remus (@chris-remus) status: Final type: Meta requires: 5 @@ -67,4 +67,4 @@ A total of **3 million AKT (3% of AKT)** is allocated for rewards that include: ## Copyright -All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). \ No newline at end of file +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-70/README.md b/src/content/aeps/aep-70/README.md new file mode 100644 index 000000000..23aa399e2 --- /dev/null +++ b/src/content/aeps/aep-70/README.md @@ -0,0 +1,26 @@ +--- +aep: 70 +title: "Console API using JWT" +author: Anil Murty (@anilmurty) Maxime Beauchamp (@baktun14) +status: draft +type: Standard +category: Interface +created: 2025-05-28 +updated: 2025-07-30 +completed: 2025-08-30 +resolution: https://github.com/akash-network/console/milestone/21?closed=1 +roadmap: minor +--- + + +## Motivation + +Accessing the API requires creating a certificate and working with mTLS. JWT eliminates the need for these. + +## Summary + +This is a follow up to the [AEP-63](https://akash.network/roadmap/aep-63/) + +## High Level Specification: + +1. Transition all existing endpoints to use JWT Authentication. This requires [AEP-64](https://akash.network/roadmap/aep-64/) to be complete \ No newline at end of file diff --git a/src/content/aeps/aep-71/README.md b/src/content/aeps/aep-71/README.md new file mode 100644 index 000000000..e16685cdf --- /dev/null +++ b/src/content/aeps/aep-71/README.md @@ -0,0 +1,23 @@ +--- +aep: 71 +title: "Deployment Closure Alert in Console" +description: "Alert and Notification when deployment closes for any reason" +author: Anil Murty (@anilmurty) Iaroslav Grishajev (@ygrishajev) +status: Final +type: Standard +category: Interface +created: 2025-07-30 +updated: 2025-07-30 +completed: 2025-07-17 +roadmap: minor +--- + +## Motivation + +A users deployment can close becaause of reasons besides running out of funds (which is addressed as part of aep-33). This second type of Alert and Notification lets the user know immediately when a deployment is closed for any reason. + +## Summary + +Users of Akash Console will have the option of configuring a "Deployment Closed" alert for any deployment within their account and optionally tieing the alert to a notification channel. The initial notification channel supported will be email with more notification channels added over time, based on customer/ user feedback. + +Github Milestone: https://github.com/akash-network/console/milestone/23 \ No newline at end of file diff --git a/src/content/aeps/aep-72/README.md b/src/content/aeps/aep-72/README.md new file mode 100644 index 000000000..8de490b9c --- /dev/null +++ b/src/content/aeps/aep-72/README.md @@ -0,0 +1,41 @@ +--- +aep: 72 +title: "Console - Improved User Onboarding" +description: "Console User Onboarding on par with leading SaaS and CSPs" +author: Anil Murty (@anilmurty) Maxime Beauchamp (@baktun14) +status: Final +type: Standard +category: Interface +created: 2025-07-31 +updated: 2025-07-31 +completed: 2025-11-07 +resolution: https://github.com/akash-network/console/milestone/18?closed=1 +roadmap: major +--- + +## Motivation + +Akash Console is the primary way that new users discover the magic of Akash Network and as such it is very important that the UX for getting them started be streamlined for maximum success along with somewhat generous trial credits comparable to other clouds (CSPs) in the industry. + +## Summary + +After surveying over a dozen products (across the public cloud, neo cloud and AI inference space) we've realized that there are a few common paradigms: +1. A more generous amount of trial credit than the $10 that Console users get +2. More restrictive trial sign up than the completely open and free trial that Console has (no sign up required, no credit card needed) +3. A streamlined onboarding process with as few distractions as possible +4. A limited time duration for deployments created that helps with freeing up resources that currently get hogged up and aren't available for new users +5. A limited time duration for the trial itself which creates a sense of urgency +6. Reminders within the UI as well as through notifications that encourage the user to upgrade and convert to a paid user so they can not lose their deployments or the trial credits. + +Further, the current trial is limited to a small subset of providers and we'd like to extend this to the entire network so that all providers benefit from it. + +With those in mind, we've enbarked on a project to build a new onboarding flow that will: +1. Optimize the landing page for new users that is significantly simpler from a cognitive load perspective +2. Require that the users sign up and enter a valid credit card (won't be charged) before they can start the trial +3. Grant the user a lot more trial credits ($100 - which is a 10x increase from current) so that they can fully experience the product +4. Open up the trial to ALL providers on the network - in conjunction with the[Tenant Incentives Pilot (TIPs)](https://github.com/orgs/akash-network/discussions/978) proposal +5. Limit trial deployments to run for only 24 hours (will be closed unless the user upgrades to paid user but they can redeploy) +6. Limit trials to last 30 days +7. Plug the onboarding flow into the email notification system built as part of the Alert & Notification work in [AEP-33](https://akash.network/roadmap/aep-33/) + +Github Milestone: https://github.com/akash-network/console/milestone/18?closed=1 diff --git a/src/content/aeps/aep-73/README.md b/src/content/aeps/aep-73/README.md new file mode 100644 index 000000000..30c84d0b5 --- /dev/null +++ b/src/content/aeps/aep-73/README.md @@ -0,0 +1,39 @@ +--- +aep: 73 +title: "Console - New Product Announcement feature" +description: "A way to inform customers and users about what's new in Akash" +author: Anil Murty (@anilmurty) +status: Draft +type: Standard +category: Interface +created: 2025-07-31 +updated: 2025-07-31 +estimated-completion: 2026-05-31 +roadmap: minor +--- + +## Motivation + +Informing Akash users and customers about what new products & features have been introduced will go a long way in driving retention. This is particularly important now as the Akash Core team and community have been rapidly adding new functionality to the products and platform this year. + +## Summary + +Keeping customers informed about new features being introduced in Akash Console and also generally on the Akash platform as whole (for both tenants/ users as well as providers) is a challenge. This is hard enough for traditional products to do where every users is known, only harder for Akash where many users are anonymous. At the same time, the risk of users not choosing Akash or worse, deciding to abandon it becuase they think a certain feature is missing while it actually is there, is significant. As such, a way to communicate these updates within the product and optionally push them to users where there is an option to do so, reduces the probability of that happening. + +## Proposed Solution + +### UI/ UX + +1. A side bar within Console on the right side that provides a running list of everything that is new in Akash. Each element in the list has a heading, a brief description and an optional hyperlink to a longer post (likely on the blog site) that provides details. +2. A "notification" icon ("New Updates") on the top bar that lights up if there are new features added to the list +3. An option to sign up to be notified when a new update drops. Doing so, will fire off an email to the user at the email address that they are registeted in Console with. +4. (Ideally) the ability for the user to provide feedback about each new feature + +### Implementation + +There are a coupld options for how the content can be managed. + +1. Use a product that specializes in this type of thing. A few examples include AnnounceKit, Appcues, Amplitude and some open source alternatives. +2. Use markdown files (like we do for for the website at https://github.com/akash-network/website/tree/main/src/content/Blog) + +Github Milestone: https://github.com/akash-network/console/milestone/25 diff --git a/src/content/aeps/aep-74/README.md b/src/content/aeps/aep-74/README.md new file mode 100644 index 000000000..6423afefc --- /dev/null +++ b/src/content/aeps/aep-74/README.md @@ -0,0 +1,39 @@ +--- +aep: 74 +title: "Console - Auto Credit Reload" +description: "Make it easier for customers to keep their deployments running" +author: Anil Murty (@anilmurty) +status: Draft +type: Standard +category: Interface +created: 2025-07-31 +updated: 2025-08-01 +completed: 2025-12-31 +resolution: https://github.com/akash-network/console/milestone/27?closed=1 +roadmap: minor +--- + +## Motivation + +Making it easier for existing paying users to keep their deployments running without needing to manually intervene on the payment side, improves user experience and drives utilization higher + +## Summary + +Akash Console added support for being able to automatically top up deployment escrow accounts before they run out via [AEP-57](https://akash.network/roadmap/aep-57/) That feature alleviated a major pain point for Akash users but only went as far as the amount of credits in the account. The next logical step to that is to extend the top up functionality so that it automatically also purchases and adds more credits to the user account which can then be used to fund any automatic escrow top up jobs periodically. This would allow users and customers with long running workloads (like inference APIs) to (in theory) never have to worry about purchasing credits or topping up their deployments. + +## Proposed Solution + +Customers that have a valid credit card saved in the system (saving of credit cards will be available once [AEP-72](https://akash.network/roadmap/aep-72/) ships) will have the option to enable "Auto Credit Reload" along with two dollar amounts - one dollar amount for the balance to maintain and another dollar amount that is a threshold at which the reload is triggered. When Auto Credit Reload is enabled, a service will monitor the customer's credit balance and when it falls below a (user defined) threshold, it will kick of a purchase of credits necessary to bring back the account balance to the configured balance. + +### UI/ UX + +1. Customer sets up and saves at least one valid credit card in the system. If there is more than one, then a default one to charge is selected. Note that implementing this is out of scope of this AEP but is addressed by [AEP-57](https://akash.network/roadmap/aep-57/) which needs to be completed before this AEP. +2. There is a global configuration in the same payment settings page that allows the user to enable or disable "Auto Credit Reload" +3. Enabling Auto Credit Reload requires them to specify two values: + - A "Reload Balance" (this is the account balance that the service will reload to periodically). There will be a minimum amount here (say $15) + - A "Reload Threshold" (this is the condition that will trigged the service to perform a reload). There will be a minimum amount for this also (say $5) +4. The service will monitor the account balance and when the balance is at or below the Reload Threshold it will trigged a credit purchase to bring the balance up to the Reload Balance. For example if the user sets a Reload Balance of $100 and a Threshold of $20, a credit purchase of $80 will automatically be performed when the account balance drops to $20. +5. The user will have an option to be notified (via email) when the Auto Credit Reload is performed. +6. If the Auto Reload fails (due to an invalid credit card or other issue) the user will be notified. Not addressing the issue may result in the balance going to $0 0 - which will over time cause deployments to run out of funds and shut down. + +Github Milestone: https://github.com/akash-network/console/milestone/27?closed=1 \ No newline at end of file diff --git a/src/content/aeps/aep-75/README.md b/src/content/aeps/aep-75/README.md new file mode 100644 index 000000000..f3cbcf234 --- /dev/null +++ b/src/content/aeps/aep-75/README.md @@ -0,0 +1,406 @@ +--- +aep: 75 +title: "Multi-depositor escrow account" +author: Artur Troian (@troian) +status: Final +type: Standard +category: Core +created: 2025-08-18 +updated: 2025-09-02 +completed: 2025-08-30 +roadmap: major +--- + +## Abstract + +This AEP proposes enhancement of the `x/escrow` module with support of multiple funds depositors, enabling flexible fund management and improved automation workflows for deployment operations. + +## Motivation + +The current deployment and escrow workflows are limited to using a single depositor for funds, which creates several significant limitations for users and automation workflows: + +### Current Limitations + +1. **Immutable Depositor Constraint**: + - If a deployment owner has multiple spend authorizations from different wallets, only one can be used during the entire lifetime of the deployment + - The depositor is immutable once set, preventing flexibility in fund management + +2. **Inefficient Fund Utilization**: + - Users cannot combine multiple smaller grants to meet larger deposit requirements + - Example: An owner with two separate grants (0.2 uAKT and 0.3 uAKT) cannot use them together to create a deployment requiring 0.5 uAKT initial deposit + - This forces users to maintain larger individual grants or miss deployment opportunities + +3. **Unnecessary Complexity in Authorization**: + - Users must explicitly specify the depositor address when using authorization (authz) for deposits + - This requirement is redundant since grants can be automatically fetched from the chain state during transaction execution + - Adds unnecessary complexity to the user experience and automation workflows + +### Benefits of Multi-Depositor Support + +Implementing multi-depositor escrow accounts would provide: +- **Flexible Fund Management**: Ability to use multiple funding sources for a single deployment +- **Improved Efficiency**: Better utilization of available grants and funds +- **Simplified User Experience**: Reduced complexity in authorization workflows +- **Enhanced Automation**: More flexible automation capabilities for deployment management + +## Key Changes + +This AEP introduces several fundamental changes to the escrow system: + +### 1. **New Deposit Type Structure** +- Introduces `akash.base.deposit.v1.Deposit` message type +- Supports multiple funding sources (balance, grants) in a single deposit +- Enables flexible fund allocation strategies + +### 2. **Enhanced Message Types** +- Updates `MsgCreateDeployment` and `MsgCreateBid` to use new deposit structure +- Replaces single depositor model with multi-source deposit capability +- Maintains backward compatibility through structured message format + +### 3. **New Escrow Operations** +- Adds `MsgAccountDeposit` for additional deposits to existing accounts +- Introduces `DepositAuthorization` for granular access control +- Supports both deployment and bid-level escrow operations + +### 4. **Improved Authorization System** +- Implements secondary indexing for efficient grant lookups +- Supports multiple authorization scopes (deployment, bid) +- Enables more sophisticated permission management + +## Technical Details + +1. Introduce new type `Deposit` in a new package `akash.base.deposit.v1` +```proto +// Source is an enum which lists source of funds for deployment deposit. +enum Source { + option (gogoproto.goproto_enum_prefix) = false; + + // Prefix should start with 0 in enum. So declaring dummy state. + invalid = 0 [(gogoproto.enumvalue_customname) = "SourceInvalid"]; + // DepositSourceBalance denotes account balance as source of funds + balance = 1 [(gogoproto.enumvalue_customname) = "SourceBalance"]; + // DepositSourceGrant denotes authz grants as source of funds + grant = 2 [(gogoproto.enumvalue_customname) = "SourceGrant"]; +} + +// Deposit is a data type use by MsgCreateDeployment, MsgDepositDeployment and MsgCreateBid to indicate source of the deposit +message Deposit { + // Deposit specifies the amount of coins to include in the deployment's first deposit. + cosmos.base.v1beta1.Coin amount = 1 [ + (gogoproto.nullable) = false, + (gogoproto.moretags) = "yaml:\"amount\"" + ]; + + // Sources list of deposit sources, each entry must be unique + repeated Source sources = 5 [ + (gogoproto.castrepeated) = "Sources", + (gogoproto.jsontag) = "deposit_sources", + (gogoproto.moretags) = "yaml:\"deposit_sources\"" + ]; +} + +``` +2. In `MsgCreateDeployment` and `MsgCreateBid` replace `deposit` and `depositor` fields and replace with `deposit` of type `Deposit` +``` +// MsgCreateDeployment defines an SDK message for creating deployment. +message MsgCreateDeployment { + option (gogoproto.equal) = false; + + // ID is the unique identifier of the deployment. + akash.deployment.v1.DeploymentID id = 1 [ + (gogoproto.nullable) = false, + (gogoproto.customname) = "ID", + (gogoproto.jsontag) = "id", + (gogoproto.moretags) = "yaml:\"id\"" + ]; + + // GroupSpec is a list of group specifications for the deployment. + // This field is required and must be a list of GroupSpec. + repeated GroupSpec groups = 2 [ + (gogoproto.nullable) = false, + (gogoproto.castrepeated) = "GroupSpecs", + (gogoproto.jsontag) = "groups", + (gogoproto.moretags) = "yaml:\"groups\"" + ]; + + // Hash of the deployment. + bytes hash = 3 [ + (gogoproto.jsontag) = "hash", + (gogoproto.moretags) = "yaml:\"hash\"" + ]; + + // Deposit specifies the amount of coins to include in the deployment's first deposit. + akash.base.deposit.v1.Deposit deposit = 4 [ + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "deposit", + (gogoproto.moretags) = "yaml:\"deposit\"" + ]; +} + +// MsgCreateBid defines an SDK message for creating Bid. +message MsgCreateBid { + option (gogoproto.equal) = false; + + akash.market.v1.BidID id = 1 [ + (gogoproto.customname) = "ID", + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "id", + (gogoproto.moretags) = "yaml:\"id\"" + ]; + + // Price holds the pricing stated on the Bid. + cosmos.base.v1beta1.DecCoin price = 2 [ + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "price", + (gogoproto.moretags) = "yaml:\"price\"" + ]; + + // Deposit holds the amount of coins to deposit. + akash.base.deposit.v1.Deposit deposit = 3 [ + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "deposit", + (gogoproto.moretags) = "yaml:\"deposit\"" + ]; + + // ResourceOffer is a list of resource offers. + repeated ResourceOffer resources_offer = 4 [ + (gogoproto.nullable) = false, + (gogoproto.castrepeated) = "ResourcesOffer", + (gogoproto.customname) = "ResourcesOffer", + (gogoproto.jsontag) = "resources_offer", + (gogoproto.moretags) = "yaml:\"resources_offer\"" + ]; +} +``` + +3. Remove `MsgDepositDeployment` and `DeploymentDepositAuthorization` from `deployment` module + +4. Introduce `MsgAccountDeposit` in escrow module +```proto +// MsgAccountDeposit represents a message to deposit funds into an existing escrow account +// on the blockchain. This is part of the interaction mechanism for managing +// deployment-related resources. +message MsgAccountDeposit { + option (gogoproto.equal) = false; + option (cosmos.msg.v1.signer) = "owner"; + + // Owner is the account bech32 address of the user who owns the deployment. + // It is a string representing a valid bech32 account address. + // + // Example: + // "akash1..." + string owner = 1 [ + (cosmos_proto.scalar) = "cosmos.AddressString", + (gogoproto.jsontag) = "owner", + (gogoproto.moretags) = "yaml:\"owner\"" + ]; + + // ID is the unique identifier of the account. + akash.escrow.id.v1.Account id = 2 [ + (gogoproto.nullable) = false, + (gogoproto.customname) = "ID", + (gogoproto.jsontag) = "id", + (gogoproto.moretags) = "yaml:\"id\"" + ]; + + akash.base.deposit.v1.Deposit deposit = 3 [ + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "deposit", + (gogoproto.moretags) = "yaml:\"deposit\"" + ]; +} +``` + +5. Introduce `DepositAuthorization` in `escrow` module +```proto +// DepositAuthorization allows the grantee to deposit up to spend_limit coins from +// the granter's account for a deployment. +message DepositAuthorization { + option (cosmos_proto.message_added_in) = "chain-sdk v0.1.0"; + option (cosmos_proto.implements_interface) = "cosmos.authz.v1beta1.Authorization"; + option (amino.name) = "akash/DepositAuthorization"; + + // State is an enum which refers to state of deployment. + enum Scope { + option (gogoproto.goproto_enum_prefix) = false; + + // Prefix should start with 0 in enum. So declaring dummy state. + invalid = 0 [(gogoproto.enumvalue_customname) = "DepositScopeInvalid"]; + // DeploymentActive denotes state for deployment active. + deployment = 1 [(gogoproto.enumvalue_customname) = "DepositScopeDeployment"]; + // DeploymentClosed denotes state for deployment closed. + bid = 2 [(gogoproto.enumvalue_customname) = "DepositScopeBid"]; + } + + // SpendLimit is the amount the grantee is authorized to spend from the granter's account for + // the purpose of deployment. + cosmos.base.v1beta1.Coin spend_limit = 1 [ + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "spend_limit" + ]; + + repeated Scope scopes = 2 [ + (gogoproto.castrepeated) = "DepositAuthorizationScopes", + (gogoproto.jsontag) = "scopes", + (gogoproto.moretags) = "yaml:\"scopes\"" + ]; +} +``` + +6. Introduce transaction message server in `escrow` module +```proto +// Msg defines the x/deployment Msg service. +service Msg { + option (cosmos.msg.v1.service) = true; + + // AccountDeposit deposits more funds into the escrow account. + rpc AccountDeposit(MsgAccountDeposit) returns (MsgAccountDepositResponse); +} +``` + +### AccountDeposit Functionality + +The `AccountDeposit` operation allows users to add additional funds to existing escrow accounts, supporting multiple funding sources and flexible deposit strategies. + +#### Workflow Overview + +1. **Message Validation**: + - Verify the account exists and belongs to the message signer + - Validate deposit amount and sources + - Check authorization permissions if using grants + +2. **Fund Processing**: + - Process each deposit source in the specified order + - Deduct funds from balance or process authorization grants + - Combine funds from multiple sources to meet the total deposit amount + +3. **Account Update**: + - Add new deposits to the escrow account + - Preserve deposit order for settlement purposes + - Update account balance and metadata + +#### Implementation Details + +##### Source Processing Order +The system processes deposit sources sequentially until the total deposit amount is satisfied: + +1. **Sequential Processing**: Each source in the `sources` array is processed one by one in the specified order +2. **Amount Tracking**: The system maintains a running total of remaining amount needed +3. **Source Exhaustion**: When a source is fully utilized, the system moves to the next source +4. **Early Termination**: Processing stops as soon as the total deposit amount is reached +5. **Validation**: If all sources are exhausted and the amount is still not met, the transaction fails + +##### Grant Authorization Processing +When processing authorization grants as a funding source: + +1. **Grant Discovery**: The system queries all available grants for the account owner +2. **Permission Validation**: Each grant is checked for proper permissions and expiration +3. **Amount Calculation**: The system determines how much can be drawn from each grant +4. **Sequential Utilization**: Grants are used in order until the required amount is satisfied +5. **Fund Transfer**: Once approved, funds are deducted from the grant and transferred to escrow +6. **Record Keeping**: Each processed grant is recorded with its source and amount details + +#### Error Handling + +The `AccountDeposit` operation handles various error scenarios: + +- **Insufficient Funds**: If the total available funds from all sources cannot meet the deposit amount +- **Invalid Account**: If the escrow account doesn't exist or doesn't belong to the signer +- **Authorization Failures**: If grant processing fails due to insufficient permissions or expired grants +- **Source Processing Errors**: If individual source processing encounters issues + +#### Settlement Order + +Deposits are processed and settled in the order they are received: + +1. **First-in-First-out (FIFO)**: Earlier deposits are settled before later ones +2. **Source Preservation**: The order of sources within each deposit is maintained +3. **Combination Logic**: Multiple deposits from the same address are combined but maintain chronological order + +#### Use Cases + +1. **Escrow Top-up**: Add funds to prevent deployment termination +2. **Grant Utilization**: Use authorization grants for additional deposits +3. **Multi-source Funding**: Combine balance and grant funds for larger deposits +4. **Automated Replenishment**: Support automated systems for maintaining escrow balances + +### Implementation Guidelines + +#### Deposit sources + +`DepositSources` array within both messages must: +- contain at least one valid deposit source +- must not contain duplicates +- must preserve order of the sources + +#### x/deployment + +Following requirements apply for both `MsgCreateDeployment` and `MsgDepositDeployment`. + +On message receive: +1. Process deposit autorizations in the order they are specified in the message + - deduct funds from source up to the `deposit` value + - if deposit amount not satisfied, try next source until either + - deposit amount is satisfies - transaction is successul + - total amount available via all sources is below requested deposit amount - transaction fails +2. Build list of depositor addresses and amounts to be deducted and transfer list to the `escrow.AccountCreate` or `escrow.AccountDeposit` + +#### x/escrow + +Following requirements apply for both `AccountCreate` and `AccountDeposit`. + +1. Order of deposits must be preserved +2. Each if deposit from same address already exists, then funds shall be combined + +##### Account settlement + +Balance settlemnt proceeds in the order of deposits have been received + +#### x/authz + +Implement secondary index for efficient search of available grants by grantee address and msgTypeUrl. +The prefix of the index is defined as: +```go +var GranteeMsgTypeUrlKey = []byte{0x04} // reverse prefix to get grantee's grants by msgTypeUrl +``` + +The `Keeper` interface should be extended with following: +```go +type Keeper interface { + GetGranteeGrantsByMsgType(ctx context.Context, grantee sdk.AccAddress, msgType string, onGrant func(context.Context, sdk.AccAddress, authz.Authorization, *time.Time) bool) +} +``` + +## Implementation Resources + +### Development Guidelines + +#### Protobuf Message Structure +All new message types follow standard Cosmos SDK patterns: +- Use `gogoproto` annotations for consistent serialization +- Implement proper validation rules and constraints +- Maintain backward compatibility where possible + +#### Authorization Implementation +The enhanced authorization system provides: +- Efficient grant lookup through secondary indexing +- Support for multiple authorization scopes +- Granular permission management for different operation types + +#### Escrow Account Management +Multi-depositor escrow accounts support: +- Multiple funding sources per account +- Ordered deposit processing +- Automatic fund combination from same addresses +- Flexible settlement strategies + +### Testing Considerations + +1. **Unit Tests**: Test individual message validation and processing +2. **Integration Tests**: Verify escrow account creation and management flows +3. **Authorization Tests**: Validate grant processing and permission enforcement +4. **Performance Tests**: Ensure efficient grant lookup and processing + +## Copyright + +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-76/DESIGN.md b/src/content/aeps/aep-76/DESIGN.md new file mode 100644 index 000000000..d6fc1fce3 --- /dev/null +++ b/src/content/aeps/aep-76/DESIGN.md @@ -0,0 +1,222 @@ +--- +title: BME Architecture Blueprint +related-aeps: AEP-76 +status: Draft +scope: Architecture +author: Greg Osuri (@gosuri) +reviewers: sig-chain, sig-economics +--- + + +## 1. Objectives & Deliverables +- Freeze protocol responsibilities for `x/bme`, `x/act`, and `x/oracle` prior to implementation. +- Define complete data model (stores, module accounts, params) and inter-module contracts. +- Slice migration path from legacy AEP-23 settlements to BME without breaking existing leases. +- Quantify rounding rules, oracle windowing, and dust handling to guarantee deterministic accounting. +- Provide governance & ops requirements (parameters, metrics, emergency hooks) needed at genesis upgrade. + +## 2. Component Boundary Overview +| Component | Responsibilities | Key Interactions | +| --- | --- | --- | +| `x/bme` | Vault custody, remint-credit ledger, ACT mint/burn execution, circuit-breaker enforcement | `MsgMintACT`, `MsgBurnACT`, settlement adapter invoked from `x/escrow.paymentWithdraw`, keeper queries for CR/metrics | +| `x/act` | Account-bound ACT balances, non-transferable spend authority, org/project scoping | Keeper exposes `Mint`, `Burn`, `SpendToModule(escrow)`; gRPC queries for balances and metadata | +| `x/oracle` | Medianized AKT/USD price from Osmosis TWAP + secondary feed, freshness validation, deviation guards | `GetPrice(ctx, PriceUseCase)` with use-case tolerances (`Mint`, `Settle`, `Refund`); emits events for monitoring | +| `x/escrow` | Existing lease escrow accounts, per-block accrual, fee routing, provider withdrawals | Calls `bmeKeeper.WithdrawFromACT(ctx, payment)` prior to subtracting fees; registers BME settlement hook without new messages | +| `x/market` | Order/bid/lease lifecycle, escrow funding, provider withdrawals | Continues using `PaymentCreate`, `PaymentWithdraw`, `PaymentClose`; orchestrates auto top-ups via `MsgMintACT` if balances low | +| Console (off-chain) | UX, credit-card onramp, telemetry dashboards | REST/gRPC client of `MsgMintACT`, streams CR metrics, invokes existing `MsgWithdrawLease` when needed | + +## 3. Data Model & Storage Layout +### 3.1 Module Accounts & Supply Hooks +- `bme_vault` (new module account) + - Holds uAKT from tenant top-ups. + - Balance mirrors `RemintCredits` until settlement consumes credits. + - Marked as `ModuleAccountTypeBasic` with burn permissions disabled (burn handled via keeper logic). +- `act_module` (new module account) + - Not spendable; exists to satisfy Cosmos SDK supply bookkeeping for ACT denominations (if minted as `uact`). +- Ensure `SupplyKeeper` whitelist includes ACT for module-level mint/burn. + +### 3.2 Key/Value Stores +| Store | Key (prefix) | Value | Notes | +| --- | --- | --- | --- | +| `bme/params` | `0x00` | `Params` | CR thresholds, oracle ids, rounding mode, min mint size | +| `bme/remint` | `0x01` + `uint64` epoch? -> to support histograms, but Phase 0 uses singleton | `sdk.Dec` | Tracks available AKT supply for remint; represented as `sdk.Dec` with 6 decimal precision | +| `bme/metrics` | `0x02` | `OutstandingACT`, `NetBurnRolling` | Maintained via structs for telemetry | +| `act/balances` | `0x10` + `sdk.AccAddress` | `sdk.Dec` | Non-transferable balances; stored in micro dollars (e.g., `uact`) | +| `act/metadata` | `0x11` + `sdk.AccAddress` | `AccountMeta` | Ties balances to auth scopes (org/project) | +| `act/escrow_commit` | `0x12` + `types.AccountID` | `sdk.Dec` | Tracks ACT that has been lent to escrow accounts; mirrors escrow `Balance+Funds` | +| `oracle/prices` | `0x20` + `PriceUseCase` | `PriceSnapshot` | Contains TWAP result, timestamp, stddev, source weights | +| `oracle/feeds` | `0x21` | `FeedConfig[]` | Governance-managed data sources | + +### 3.3 State Invariants +- `RemintCredits >= 0` (except transiently in halt mode; if negative, circuit breaker escalates). +- `OutstandingACT = Σ balances(owner) = Σ escrow_commit` (validated via end blocker). +- Every escrow account (`types.Account`) must hold `Balance.Denom == uact`; `escrow_commit[id]` mirrors `Balance + Funds` to the micro-dollar. +- `CR = (VaultBalance(uakt_to_dec) * P_ref) / OutstandingACT`; enforce `CR >= params.soft_floor` outside halt. +- Vault bank balance must equal `RemintCredits` converted to `sdk.Int` minus in-flight dust accumulator. +- ACT denomination is non-transferable: bank `SendCoins` disabled for `uact` except from module to module for refunds/settlements via keeper-controlled path. + +### 3.4 Numeric Representation & Rounding +- Base units: `uakt` (1e-6 AKT) and `uact` (1e-6 USD). +- Maintain internal `sdk.Dec` precision at 18 decimal places; round to 6 decimals for external ledger writes. +- Mint path: floor ACT amounts (tenant-friendly), accumulate residual in `VaultDust` (new field) and add to `RemintCredits` when exceeding `1 uakt`. +- Settlement path: floor AKT payouts (provider-safe). Residual flows to `VaultDust` as positive drift, improving CR. +- Refund path mirrors settlement rounding to ensure no arbitrage beyond dust-level. + +### 3.5 Keeper Interfaces (Phase 0 contracts) +- `x/act` exposes: + - `Credit(ctx, owner, amount sdk.DecCoin, meta AccountMeta)` → increase free ACT balance. + - `CommitToEscrow(ctx, owner, escrowID, amount)` → move ACT from free balance to `escrow_commit[escrowID]` (without bank transfer). + - `BurnEscrowBalance(ctx, escrowID, amount)` → reduce commitment during settlement/refund. + - `SpendToModule(ctx, owner, module string, amount)` → helper for migrations (console seed) but locked to `escrow` module. +- `x/bme` exposes: + - `WithdrawFromACT(ctx, escrowID, provider sdk.AccAddress, amount sdk.DecCoin) (sdk.Int, sdk.Int, sdk.Int, sdk.Dec, error)` returning `(akt_out, use_vault, shortfall, price)`. + - `QueryVault`, `QueryCR`, telemetry getters. +- `x/escrow` adds hook registration: + - `SetSettlementAdapter(fn SettlementAdapter)` where `SettlementAdapter` signature matches `WithdrawFromACT`; default implementation is no-op unless BME enabled. + +## 4. Oracle Architecture +### 4.1 Feed Pipeline +1. Osmosis TWAP querier (IBC / LCD) supplies 30-min TWAP with 5-min step size, outlier rejection 0.1%. +2. External feed aggregator (e.g., Pyth) delivered via off-chain relayer posting `MsgSubmitPrice` to `x/oracle`. +3. `x/oracle` keeper medianizes available feeds per use-case. Each use-case may specify: + - `MaxAge` (e.g., mint: 10m; settle/refund: 5m). + - `MaxDeviation` (1.5%). + - `Smoothing` (EWMA coefficient for CR reference price). +4. On failure, `ErrOracle` surfaces; `x/bme` can degrade to emergency policy (halt new mints). + +### 4.2 Data Structures +```go +// oracle/types/price.go +type PriceSnapshot struct { + DenomPair string // "AKT/USD" + UseCase PriceUseCase + Price sdk.Dec // medianized result + Sources []PriceSource // metadata for audits + Timestamp time.Time + StdDev sdk.Dec // for monitoring +} + +type PriceSource struct { + ID string // e.g., "osmosis:pool-XYZ" + Price sdk.Dec + Weight sdk.Dec + UpdateTime time.Time +} +``` + +### 4.3 Governance Parameters +- `OracleMinFeeds`: default 1 (Osmosis) but prefer 2 for production. +- `OracleHaltDeviation`: if deviation between sources > 3%, trip breaker. +- `OracleDriftMax`: reject price if last update older than `MaxAge` (per use-case). Emits telemetry. + +## 5. Process Flows (Textual State Machines) +### 5.1 Mint Path (`MsgMintACT`) +`MsgMintACT` fields: +- `payer sdk.AccAddress` +- `owner sdk.AccAddress` +- `akt_in sdk.Coin` (optional, mutually exclusive with `usd_exact`) +- `usd_exact sdk.Dec` (optional USD target; keeper back-solves AKT) +- `escrow_id types.AccountID` (optional, pre-commit minted ACT to escrow account) +- `memo string` (metadata such as invoice or payment provider reference) + +1. Auth layer validates payer signature or feegrant. +2. Keeper fetches `P_mint` with `PriceUseCaseMint` tolerances. +3. Determine `akt_req`: + - If `usd_exact`: `ceil(usd_exact / P_mint)`. + - Else `akt_in` direct. +4. `bank.SendCoins(payer → bme_vault, akt_req)`; update `RemintCredits += akt_req`. +5. Calculate `act_out = floor(akt_req * P_mint)`; `actKeeper.Credit(owner, act_out)` adds to non-transferable ledger and, if requested, increments `escrow_commit` for the targeted escrow account. +6. Update metrics: `OutstandingACT += act_out`; recompute `CR` using smoothed `P_ref` (EWMA seeded with minted price). +7. Emit `EventMint` and persist TX trace (payer, owner, price, dust, escrow_id if provided). +8. Circuit breaker check: if new `CR` < `params.halt_threshold`, revert unless bypass flag set via governance emergency. + +### 5.2 Settlement Path (escrow `AccountSettle` → `paymentWithdraw`) +1. Existing `x/escrow` end-block logic advances `AccountSettle` for each open escrow account, accruing `FractionalPayment.Balance` in `uact`. +2. When `PaymentWithdraw` or `PaymentClose` is invoked (from `MsgWithdrawLease`, `MsgCloseBid`, or automatic hooks), control enters the new BME settlement adapter before fees are computed. +3. Adapter obtains `act_spend = payment.Balance` (DecCoin, denom `uact`), resolves the escrow account owner from `act/escrow_commit`, and calls `bmeKeeper.WithdrawFromACT(ctx, escrowID, provider, act_spend)`. +4. `WithdrawFromACT`: + - Fetches `P_settle` using `PriceUseCaseSettle` (freshness 5 min, deviation ≤ 1.5%). + - Burns ACT from the tenant escrow account (`x/act.BurnEscrowBalance`), updating `OutstandingACT -= act_spend`. + - Calculates `akt_out = floor(act_spend / P_settle)`. + - Decrements `RemintCredits` by `use_vault = min(akt_out, RemintCredits)` and transfers that amount of `uakt` from `bme_vault` to the escrow module. + - Mints any `shortfall = akt_out - use_vault` directly into the escrow module using `x/mint` (inflationary component). + - Returns `(akt_out, use_vault, shortfall, P_settle)` to the caller and emits `event_bme_settlement`. +5. `paymentWithdraw` replaces the payment balance with a `sdk.DecCoin{Denom: uakt, Amount: akt_out}` before calling `TakeKeeper.SubtractFees`, preserving existing fee logic. +6. Coins are sent from the escrow module to the provider in `uakt` without introducing new messages; telemetry captures vault vs. mint proportions and updated CR. +7. Circuit breaker checks run inside `WithdrawFromACT`. If CR would fall below `halt_threshold`, adapter returns `ErrCircuitBreaker` causing the withdrawal to defer until governance intervention (settlement queue remains pending but no extra gas is charged). + +### 5.3 Refund Path (`MsgBurnACT`) +`MsgBurnACT` fields: +- `owner sdk.AccAddress` +- `to sdk.AccAddress` (defaults to owner) +- `act_burn sdk.DecCoin` +- `memo string` + +Mirror settlement but `to` is tenant-provided address; share code path to reduce divergence. Refunds allowed even in halt (unless governance toggles `refund_disabled`). + +## 6. Circuit Breaker Logic +- Parameters: `warn_threshold` (default 0.95), `halt_threshold` (0.90), `restart_threshold` (0.93 to resume), `mint_cooldown_blocks` (throttles per block). +- Implementation: stored in `Params`; `x/bme` leverages `BeforeMint` hook to enforce. +- Breaking sequence: + 1. If `CR < warn`: emit `EventCRWarning`, no functional change. + 2. If `CR < halt`: set `MintPaused = true`; `MsgMintACT` errors with `ErrCircuitBreaker`. + 3. Settlements/refunds remain allowed to restore CR. + 4. Governance or automatic logic flips `MintPaused` once CR > restart for `cooldown_period` blocks. +- Additional lever: `MaxMintPerBlock` parameter to avoid sudden CR dilutions. + +## 7. Migration Strategy (Phase 0 Decisions) +1. **Pre-upgrade snapshot:** capture outstanding stable balances and awaiting settlements from AEP-23. +2. **Convert console stable escrow → ACT** + - Off-chain service buys AKT at `P_transition` (oracle price at upgrade block). + - Submit `MsgMintACT` using AKT purchased; ensures `bme_vault` seeded and `RemintCredits` align with outstanding liability. +3. **Seed Vault Buffer:** Governance deposit of AKT (e.g., community pool transfer) to adjust CR > 1 before enabling new mints. +4. **Disable stable payouts:** set `x/market` params to ignore USDC; ensure no outstanding USDC invoices remain. +5. **Genesis modifications:** include new module accounts, new params, zeroed stores, and initial metrics (CR computed from seeded values). +6. **Backwards compatibility:** Provide read-only translation endpoints so console can display pre-upgrade balances as ACT. + +## 8. Parameter Catalog (Initial Values TBD) +| Parameter | Module | Purpose | Notes | +| --- | --- | --- | --- | +| `warn_threshold` | `x/bme` | CR warning trigger | e.g., 0.95 | +| `halt_threshold` | `x/bme` | Pause mint threshold | e.g., 0.90 | +| `restart_threshold` | `x/bme` | Resume mint threshold | > halt | +| `mint_cooldown_blocks` | `x/bme` | Rate limit for new ACT mints | e.g., 10 | +| `max_mint_per_block` | `x/bme` | Additional throttle | sized vs. liquidity | +| `oracle_max_age_mint` | `x/oracle` | Freshness requirement | 600s | +| `oracle_max_age_settle` | `x/oracle` | 300s | +| `oracle_deviation_limit` | `x/oracle` | Source disagreement bound | 0.015 | +| `dust_threshold_uakt` | `x/bme` | When to sweep rounding dust | 1 uAKT | +| `min_mint_act` | `x/bme` | Lower bound per mint (USD) | e.g., 10 USD | +| `refund_fee_bps` | `x/bme` | Optional fee to discourage frequent refunds | default 0 | + +## 9. Security & Audit Focus Areas +- **Oracle Manipulation:** Document handshake between console RFQ and on-chain oracle; require price attestations to match within tolerance; design monitoring for feed downtime. +- **Module Accounts:** Ensure `bme_vault` flagged in supply keeper to avoid send-enabled operations; restrict `uact` send to module pathways to prevent leakage. +- **Invariant Checks:** Add end-block invariants verifying `OutstandingACT` sum, `RemintCredits` alignment, and `CR` calculations. Provide CLI `akashd q bme invariants` for ops. +- **AuthZ:** Phase 0 defines `MsgMintACT`/`MsgBurnACT` permissions; console uses `ServiceAccount` pattern via `authz` to mint on behalf of tenant. + +## 10. Observability & Telemetry Requirements +- Instrument Prometheus gauges: `bme_outstanding_act`, `bme_vault_uakt`, `bme_cr`, `bme_mint_paused`. +- Emit events with consistent attribute keys for price, dust, vault usage to feed data warehouse. +- Provide gRPC queries: + - `QueryVault()` returning balance, remint credits, CR. + - `QueryAccountACT(owner)` returning balances and spend forecasts. + - `QueryParameters()` for governance dashboards. +- Define ABCI telemetry for circuit-breaker transitions (info log + event). + +## 11. Open Questions for Sign-off +1. **Denomination Strategy:** Keep ACT off bank module (custom denom) vs. integrate as IBC-denom-like? Decision required to avoid IBC send. +2. **RemintCredits Representation:** Should we track as `sdk.Int` (1e-6 precision) or `sdk.Dec` to accommodate fractional results? Implementation preference affects rounding logic. +3. **Oracle Delivery:** Do we rely on existing price server infrastructure, or must we introduce new relayer? Need ops alignment. +4. **Refund Policy:** Should refunds remain unrestricted during CR stress or adopt staged throttling (e.g., queue with per-block cap)? +5. **Dust Sweeping Frequency:** prefer immediate re-credit vs. periodic end-block sweep; decision influences deterministic accounting tests. +6. **Genesis Seed Amount:** Determine AKT buffer to target initial CR ≥ 1.05. + +## 12. Sign-off Checklist +- [ ] Economics + governance approve CR thresholds, rounding, and dust behavior. +- [ ] Protocol engineering validates store layout, invariants, and message schemas. +- [ ] Console/backend teams confirm API contracts (`MsgMintACT`, telemetry endpoints). +- [ ] DevOps confirms oracle infra and monitoring coverage. +- [ ] Migration plan reviewed with product & support; run book drafted. + +--- diff --git a/src/content/aeps/aep-76/README.md b/src/content/aeps/aep-76/README.md new file mode 100644 index 000000000..98d79269c --- /dev/null +++ b/src/content/aeps/aep-76/README.md @@ -0,0 +1,488 @@ +--- +aep: 76 +title: "Burn Mint Equilibrium On Akash" +author: Greg Osuri (@gosuri) +status: Final +type: Standard +category: Economics +created: 2025-09-21 +updated: 2026-03-06 +estimated-completion: 2026-03-23 +roadmap: major +replaces: 55 +--- +### Motivation + +AKT is the native cryptocurrency of Akash Network and was initially conceived as the sole payment method. When a lease is established, tenants and providers agree on a price in AKT. These leases are open-ended, continuing until either party terminates them. However, the fluctuating value of AKT presents a challenge. Participants typically anticipate stable pricing equivalent to USD, and AKT's price instability compromises its utility as a payment mechanism. + +To address this challenge, [AEP-23](https://github.com/akash-network/AEP/tree/main/spec/aep-23) was proposed, approved, and implemented, introducing stablecoin payments alongside AKT. This system enables tenants to utilize whitelisted stablecoins, such as USDC, for both pricing and settlement. A portion of the hosting fees collected during settlement is distributed to AKT stakers, rewarding them for their role in securing the blockchain. While this stable payment solution significantly boosted revenue growth on the Akash network, it also led to a notable drawback: reduced demand for AKT, as the staking incentives proved insufficient to sustain the token's value. + +Maintaining the foundational role of AKT within the Akash ecosystem is crucial. Therefore, revitalizing demand for AKT while simultaneously ensuring stable settlement options is essential. + +#### Goals +1. Restore **AKT‑only settlement** while preserving tenants’ **stable USD experience**. +2. Increase **structural AKT demand** and shrink effective float. +3. Make the mechanism **sustainably neutral** without hidden taxes. + +#### Non Goals +1. Replace or modify consensus/security. +2. Depend on any single off‑chain venue; use diverse liquidity sources (Osmosis TWAP[^3] and external oracle aggregation). + +### Burn-Mint-Equilibrium Overview + +The Burn Mint Equilibrium (BME) is a tokenomic model designed to balance supply and demand between a volatile value-accruing token (like AKT) and a stable utility token used for transactions. Inspired by models like those in Factom, Helium, and algorithmic stablecoins, BME creates ongoing demand for the volatile token by requiring it to be burned for minting the stable token, while allowing the reverse process (burning the stable token to mint the volatile one) to maintain equilibrium via arbitrage. This prevents supply bloat, captures usage value in the volatile token, and encourages holding/staking. + +In the context of Akash Network, the goal is to preserve stable USD-equivalent payments (building on AEP-23's success in driving revenue growth) while revitalizing demand for AKT. Currently, AEP-23 allows tenants to use whitelisted stablecoins (e.g., USDC) for lease pricing and settlement, with a percentage of fees distributed to AKT stakers. However, this shifts demand away from AKT. A BME mechanism addresses this by introducing a native stable token backed algorithmically by AKT, making AKT essential for generating stable payment units. + +### Key Benefits for Akash + +* **Stable Payments Retained:** Leases remain priced and settled in stable units, avoiding AKT volatility for users. +* **AKT Demand Revitalized:** Minting stable units requires burning AKT, creating buy-and-burn pressure proportional to network usage. +* **Deflationary Potential:** If network growth leads to more burns than mints, AKT becomes scarcer, benefiting stakers. Helium validated this as its native token HNT became net [deflationary](https://twitter.com/MessariCrypto/status/1965069808887886043) recently as adoption grew. +* **Compatibility with AEP-23:** This can extend the existing stablecoin system by integrating a native BME token as the primary or optional payment method, with fees still flowing to stakers. + +## Proposal + +BME overhauls Akash network lease settlement and tenant payments, improving user experience and tokenomics. + +#### Core Changes +* **AKT for Lease Settlement:** All leases settle in AKT. +* **ACT for Tenant Payments:** Tenants prepay with ACT, a non-transferable USD-pegged compute credit, generated by burning AKT. ACT is burned at settlement, and AKT re-minted to providers at the current price, automatically adjusting supply based on AKT demand. + +#### Enhanced User Experience (UX) +* **USD Quoting:** Console and APIs display costs in USD. +* **Flexible Payments:** Tenants can pay with credit cards or AKT. +* **Credit Card Payments Drive AKT Demand:** Credit card payments trigger immediate AKT market buys, which are then burned to mint ACT. + +#### Positive Token Effects +* **Increased AKT Demand:** Every dollar spent on compute boosts AKT demand. +* **ACT as Supply Sink:** Outstanding ACT reduces circulating AKT supply between top-up and payout. +* **Net Burns from AKT Appreciation:** AKT appreciation between top-up and payout results in a net burn of AKT. +* **AEP-23 Take-Rate Eliminated:** BME removes AEP-23 take-rates, re-centering the economic model on AKT. + +##### Driving AKT Demand and Price Support +* **Immediate AKT Market Buy:** Every dollar spent on compute directly drives AKT demand. +* **Outstanding ACT as Escrowed AKT (BME Vault):** ACT held by users acts as escrowed AKT, removing it from liquid supply. +* **Favorable Price Drift and Net Burns:** If AKT price increases between top-up and settlement, a net burn of AKT occurs at payout. +* **Batch Top-ups Deepen AKT Sink:** Larger, infrequent top-ups increase outstanding ACT, deepening the AKT sink and reducing circulating supply. + +##### Economic Walkthrough + +* Assume the average tenant keeps **7 days** of runway in ACT. +* If monthly gross new top‑ups are $10M, that’s an immediate **market buy** of \~**8.77M AKT** at USD 1.14[^1] and removal from float until spent. +* If the blended settlement price later averages $1.25, providers receive **\~10M USD / 1.25 \= 8.00M AKT**. + * **Net monthly burn:** **\~0.77M AKT** plus the *time‑in‑vault* float reduction during the month. + +### Tokens & Denoms +* **AKT (existing):** staking token & settlement currency. +* **ACT (new):** *Akash Compute Token*. + * **Peg:** 1 ACT ≈ **$1** of compute credit. + * **Decimals:** 6 (denom uact) to match cosmos conventions. + * **Non‑transferable:** balance is **soulbound** to the funding account (tenant, org, or project). + * **Use:** only to pay Akash lease invoices; cannot be sent peer‑to‑peer or traded. + +## Core Mechanics + +Let: + +* $P_t$ = 30‑min TWAP AKT/USD price at time t. +* $B_t$ \= AKT burned to mint ACT at time t. +* $A_t$ \= ACT minted at time t. +* $S_t$ \= AKT minted to providers at settlement time t. +* Spread parameters initially **25 bps** + +**Mint ACT (tenant top‑up)** + +* Tenant supplies AKT **or** pays by card: + * *AKT path:* user sends $B_t$ AKT → protocol burns (moves to BME vault) → mints + $A_t = B_t \cdot P_t$ ACT + * *Card path:* Console market‑buys AKT for \$X on Osmosis/aggregators, receives $B_t$ AKT, sends to chain → same burn/mint as above; **ACT minted equals net $ received** so tenants see precise dollar value. + +**Settle (provider payday)** +When an invoice for A ACT is due, the protocol: + +1. **Burns** $A$ ACT, and, +2. **Mints** $S = \frac{A}{P_{settle}}$ AKT to the provider. + +**Net supply effect per dollar of usage** +* At top‑up: burn $\frac{1}{P_{mint}}$ AKT. +* At settlement: mint $\frac{1}{P_{settle}}$ AKT. +* **ΔSupply (AKT)** = $\frac{1}{P_{mint}} - \frac{1}{P_{settle}}$. + * If **price rises** while credits are outstanding → **net burn**. + * If **price falls** → **net mint**. +* Between events, **AKT float is reduced** by the amount sequestered in the BME vault against outstanding ACT. + +**Concrete Example** +* Tenant tops up **1,000 USD ⇒ burn 877.192982 AKT[^1]**, mint 1,000 ACT. +* If settlement occurs at **1.50 USD** ⇒ mint **666.666667 AKT** to provider ⇒ **net −210.526316 AKT** (deflation). +* If settlement at **0.90 USD** ⇒ mint **1,111.111111 AKT** ⇒ **net \+233.918129 AKT** (inflation). (*These deltas balance over time; outstanding ACT acts like an elastic buffer.*) + +## Architecture + +### Modules +#### `x/bme` Burn‑Mint Equilibrium module + +* **Vault accounting:** + * “Burn” moves AKT into a **BME vault module account** (circulating supply goes down). + * The vault maintains a **Remint Credit** ledger so provider payouts first consume burned AKT; only if short, mint new AKT. This preserves the *hard cap framing* at the net level. +* **Invariants & queries:** + * `OutstandingACT()`: total ACT supply. + * `VaultAKT()`: AKT in vault with remint credits. + * **Collateral ratio:** $CR=\frac{VaultAKT \cdot P}{OutstandingACT}$ + * Governance thresholds for **circuit breakers** +* **EndBlocker:** batch‑settles invoices (e.g., every block, or 1–5 min epochs). + + *(Cosmos SDK natively supports bank burns and module accounts; minting can be customized via x/mint’s MintFn to draw down remint credits first.)* + + +#### `x/act` Account‑bound compute credits +* Non‑transferable balances keyed by owner (EOA, org, project). +* Messages: MsgMintACT, MsgBurnACT. +* Spend path is only callable by x/market during settlement. + + +#### `x/oracle` Price feeds +* **Primary:** Osmosis AKT/USDC **TWAP** (e.g., 30‑min, 0.1% outlier rejection). +* **Secondary:** external oracle (e.g., Pyth/Chainlink via IBC or relays) with fallback rules & medianization. (Osmosis [TWAP](https://docs.osmosis.zone/overview/features/concentrated-liquidity) exists; Pyth has Cosmos‑SDK deployments; pattern is standard) + + +#### `console` Payments UX (AEP‑31 aligned) +* Card on‑ramp (existing roadmap item) performs **immediate AKT buy** and calls `MsgMintACT` so tenants always see **$‑exact** credit. (Credit‑card payments feature tracked as [AEP‑31](https://akash.network/roadmap/aep-31/)) +* Auto‑top‑up & batch recharges (configurable days‑of‑runway). + +#### `x/market` Leases & settlement +* Quotes remain **USD‑first** in the UI; on chain, settlement pays providers **in AKT** by consuming ACT. +* Providers optionally auto‑stake a % of payouts (off by default). +* AEP‑23 brought [stable payments](https://akash.network/docs/deployments/stable-payment-deployments) and take‑rates; with BME we keep the stable UX but remove take‑rates and settle AKT‑only.* + +##### Settlement loop (system flow) + +**Preconditions** +* Lease exists, invoice is due, amount matches `act_spend` (or ≤ remaining). +* `from` has ≥ `act_spend` ACT. +* Oracle healthy (settlement price). +* Circuit breaker may **slow** but should not block normal settlement unless in extreme halt mode. + +**Flow** +1. Oracle read (settlement price) + * Fetch $P_{settle}$ = AKT/USD TWAP. +2. Burn ACT + * `act.Burn(from, act_spend)` + * `OutstandingACT = act_spend` +3. Compute AKT owed + * `akt_out = floor(act_spend / P_settle)` (6-decimals) +4. Pay from the vault (deflationary first) + * `use_vault = min(akt_out, RemintCredits)` + * If `use_vault > 0`: + * Decrease `RemintCredits -= use_vault` + * `bank.SendCoins(BME_VAULT:provider, use_vault)` + * `shortfall = akt_out - use_vault` +5. Inflationary mint for any shortfall + * If `shortfall > 0`: + * `mint.MintCoins(provider, shortfall)` + * (No change to `VaultAKT`; net increase to circulating supply by `shortfall`.) +6. Finalize lease accounting + * Mark invoice (partial/complete) as paid for `act_spend`. + * Update provider’s earnings ledger. +7. Events & logs + * `event_bme_spend {lease_id, from, provider, act_burn: act_spend, akt_out, P_settle, vault_paid: use_vault, minted: shortfall}` +8. Post-conditions / Invariants + * ACT supply decreased by `act_spend`. + * Provider receives `akt_out` AKT. + * Net AKT supply changes by: `Δ = shortfall - use_vault_vanish` (Intuition: if `use_vault = akt_out` → **no inflation**, and circulating AKT only re-enters; if price rose since mint, `use_vault < akt_in_at_mint`, so **net burn** already happened at the earlier step.) + +**Failure cases** +* Oracle unhealthy → `ErrOracle`. +* Insufficient ACT → `ErrInsufficientACT`. +* Lease invariants mismatch / double-spend attempt → `ErrInvoiceState` + +### BME Vault +BME Vault is a module account and a ledger (“remint credits”) that holds the AKT created when tenants fund ACT (their dollar-pegged compute credits). Those AKT are out of circulation until providers are paid. At payout, the protocol re-issues AKT (up to what’s owed) using the oracle price at settlement. Net supply moves only by the difference between the AKT burned at top-up and the AKT reminted at payout. + +#### Benefits +1. **Structural buy-pressure:** Every dollar of usage triggers an AKT market buy (card → buy AKT → burn to mint ACT). That’s constant demand for AKT. +2. **Reduced circulating supply (scarcity):** The AKT tied to outstanding ACT sits in the vault, not tradable, shrinking liquid float until workloads are settled. +3. **Automatic net-burn when price rises:** If AKT appreciates between top-up and payout, fewer AKT are needed to pay the provider than were originally burned → net burn (deflationary). +4. **Soft landing when price falls If AKT declines:** the vault first uses its “remint credits.” Only if that’s insufficient does the system mint the shortfall (inflationary). Circuit-breakers can slow new ACT mints during severe drawdowns—no fee/take-rate needed. +5. **Stable UX with AKT-only settlement:** Tenants always see exact $ funding; providers always receive AKT. The vault is the bridge that makes those two things compatible. + +#### Key Metrics to Watch +* **Outstanding ACT**: the protocol’s dollar liability to providers. +* **Collateral Ratio (CR)**: the vault’s AKT value vs. that liability: + +```math +CR = \frac{VaultAKT × P}{OutstandingACT} +``` +* If ***CR \> 1***: the vault’s AKT is worth more than the credits outstanding (healthy buffer; likely net burns on payout). +* If ***CR \< 1***: the vault will need to remint some AKT on payout (inflationary), so circuit-breakers can kick in to protect the system. + +### + +#### Concrete Example + +* Tenant funds **10,000,000 USD** in **ACT** by burning AKT priced at 1.14 USD[^1] → vault receives/burns **8,771,929 AKT**. +* If payouts happen at **1.25 USD**, the protocol remints $10,000,000 / 1.25 \= **8,000,000 AKT** to providers. + * Net effect: **−771,929 AKT (deflation).** +* If payouts happen at **0.90 USD**, the protocol needs **11,111,111 AKT**; it reissues the 8.77M from the vault and mints the \~2.34M shortfall (inflation). + While those credits are outstanding, that **8.77M AKT is out of circulation**. If spot drifts up to $1.20 before payout, the vault’s value vs. the liability becomes: + + $CR=\frac{8.77M × 1.20}{10M}≈1.053$ - a \~5.3% buffer that will materialize as **net burn** on payout. + +#### Implementation notes + +You can implement the “burn” either as a true burn and later mint, or as escrow in a non-spendable module account with a remint-credits ledger. In both cases, circulating supply drops immediately and only changes net by the price-timing delta. + +Because the vault is transparent (on-chain), you get auditable metrics: `VaultAKT`, `OutstandingACT`, `CR`, `NetBurn24h/30d`. + +#### Risk controls + +We avoid per‑lease take‑rates. Instead, we rely on **design‑time controls**: +* **Oracle safety:** dual‑feed medianization; TWAP windows (e.g., 30 min); max update drift; pause on disagreement \> X%. *TWAP mechanics [documented](https://docs.osmosis.zone/overview/features/concentrated-liquidity) by Osmosis*. +* **Circuit breakers:** if $CR<0.93~$ (configurable), temporary measures engage: + * Slow mints (shorter settlement epochs), + * Require new ACT mints to be ≥ N days of run‑rate (encourages batching), + * If $CR<0.90$, fallback: pause new ACT mints; tenants can still pay directly in **AKT** (UI reveals an “AKT direct” path) until CR recovers. +* **Initial reserve (one‑time):** seed the BME vault with a small **AKT volatility buffer** via governance/community pool (similar [buffers](https://github.com/orgs/akash-network/discussions/930) have been budgeted in governance requests). This covers rare fast‑down moves without taxing leases. +* **Optional micro‑spread (0–25 bps) on *mint only***: If governance wants a self‑healing buffer without “fees on providers,” apply a tiny haircut only at **ACT mint** (tenant side), and send the surplus to the BME vault. This is **not a take‑rate on hosting fees** and can default to **25 bps** at launch. + +## Governance Parameters + +1. **Oracle Mechanism:** The system utilizes a robust oracle mechanism to determine the precise value for transactions. This mechanism calculates a median price based on two sources: a 30-minute Time-Weighted Average Price (TWAP) from Osmosis, and an external oracle, also sampled over a 30-minute period. To ensure data integrity and prevent manipulation, any price outlier exceeding 1.5% from the median is automatically ignored, ensuring a stable and reliable valuation. + +1. **Settlement Epoch Options:** To optimize for different operational needs, the system offers two distinct settlement epoch options: + * **Per Block Settlement (Fast):** This option enables immediate settlement of transactions with every new block validated on the network. This provides the quickest possible finality for operations where speed is paramount. + * **Every 5 Minutes Settlement (Gas-Friendly):** For scenarios where gas efficiency is a higher priority than instantaneous settlement, transactions can be batched and settled every five minutes. This approach helps reduce transaction costs by optimizing gas usage. + +3. **Circuit Breakers for Stability:** To safeguard the system against extreme market volatility and maintain stability, two crucial circuit breakers are implemented: + * **CRwarn (Credit Ratio Warning):** At a credit ratio of 0.95, a warning is triggered, signaling potential instability and prompting system administrators to monitor the situation closely. + * **CRhalt (Credit Ratio Halt):** Should the credit ratio drop further to 0.90, a halt is initiated, temporarily suspending certain operations to prevent further destabilization and allow for corrective measures to be taken. + +4. **Mint/Settle Spreads:** Initially, the system operates with a mint/settle spread of 25 basis points (bps). This spread represents the difference between the price at which new tokens are minted and the price at which they can be settled, contributing to the system's economic model and liquidity. + +5. **ACT Expiry and Refundability:** Akash Credit Tokens (ACT) are designed without an expiration date, ensuring that credits held by users remain valid indefinitely. Furthermore, ACT are fully refundable back to AKT (Akash Token) at the current oracle price. This feature provides flexibility and liquidity for ACT holders, allowing them to convert their credits back to the native token at any time based on the prevailing market rate. + +6. **Denomination of ACT:** For precision in transactions and system operations, ACT is denominated in `uact`. The conversion rate is standardized as 1 ACT equal to 1,000,000 `uact` (`1e6 uact`), facilitating granular accounting and transaction processing within the ecosystem. + + +## Developer Spec + +### Messages + +#### MsgMintACT +Mint non-transferable ACT credits by removing AKT from circulation. + +##### Purpose +* Turn **AKT → ACT ($1 credits)** at the mint TWAP price. +* Two UX paths collapse to the same on-chain flow: + * **AKT path:** tenant funds with AKT directly. + * **Card path:** console market-buys AKT off-chain and pays on-chain; chain still just sees AKT. + +##### Message + +```protobuf +MsgMintACT { + // who provides AKT (tenant or console wallet) + payer: Address + // whose ACT balance to credit (tenant/org/project) + owner: Address + // AKT amount provided (mutually exclusive with usd_exact) + akt_in: sdk.DecCoin + // optional: request “exact $X ACT”; keeper back-solves AKT + usd_exact: sdk.DecCoin + // optional metadata (invoice id, cc receipt id) + memo: string? +} +``` + +###### Preconditions + +* `owner` is a valid ACT account. +* Oracle is **healthy** (TWAP within drift limits; price freshness OK). +* Circuit breakers not blocking mints. +* If `usd_exact` is set, keeper must be able to **pull** or **receive** the corresponding AKT from `payer`. + +###### Flow + +1. Ante / auth + * Verify `payer` signature (or valid feegrant/authz). + * Collect fees, increment sequence. +2. Oracle read (mint price) + * Fetch **P_mint = AKT/USD TWAP** (e.g., 30-min, medianized). + * Reject if stale or outside deviation bounds. +3. Determine AKT to pull + * If `akt_in` provided → `akt_req = akt_in`. + * Else if `usd_exact` → `akt_req = ceil(usd_exact / P_mint)` (round up 6-decimals). + * Guard: `akt_req > 0`. +4. Funds movement + * `bank.SendCoins(payer → BME_VAULT, akt_req uakt)` + * Increase **RemintCredits** by `akt_req`. (This is the book-entry saying “we can later re-issue up to this much without fresh inflation.”) +5. Mint ACT + * `act_out = floor(akt_req * P_mint)` (6-decimals). + * `act.Mint(owner, act_out)` +6. State updates + * `OutstandingACT += act_out` + * Recompute **CR**: `(VaultAKT * P_ref) / OutstandingACT` (`P_ref` can be `P_mint` or a moving ref). +7. Events & logs + * `event_bme_mint { payer, owner, akt_in: akt_req, act_out, P_mint }` +8. Post-conditions / invariants + * `VaultAKT` increased by `akt_req`. + * `OutstandingACT` increased by `act_out`. + * **Circulating AKT** effectively down by `akt_req`. + +##### Failure Cases + +* Oracle unhealthy → `ErrOracle`. +* Circuit breaker active (`CR < halt`) → `ErrCircuitBreaker`. +* Insufficient AKT from `payer` → `ErrInsufficientFunds`. + +#### MsgBurnACT + +Tenant burns unused ACT and receives AKT back at **current price**. + +##### Purpose + +* Let users unwind credits if they stop using compute. +* Keeps UX fair; also exercises the same safety path as settlement. + +##### Message + +```protobuf +MsgBurnACT { + owner: Address // whose ACT to burn + to: Address // who receives AKT (default = owner) + act_burn: sdk.DecCoin // ACT to convert back to AKT +} +``` + +###### Preconditions + +* `owner` has ≥ `act_burn` ACT. +* Oracle healthy (refund price). +* Circuit breaker policy may throttle refunds only under extreme CR stress (configurable), but default is **always allow**. + +###### Flow + +1. Ante / auth + * Verify `owner` (or authz). +2. Oracle read (refund price) + * P\_now \= AKT/USD TWAP +3. Burn ACT + * `act.Burn(owner, act_burn)` + * `OutstandingACT -= act_burn` +4. Compute AKT to return + * `akt_out = floor(act_burn / P_now)` +5. Pay from the vault first + * `use_vault = min(akt_out, RemintCredits)` + * If `use_vault > 0`: + * `RemintCredits -= use_vault` + * `bank.SendCoins(BME_VAULT → to, use_vault)` + * `shortfall = akt_out - use_vault` +6. Inflationary mint for shortfall (if any) + * If `shortfall > 0`: + * `mint.MintCoins(to, shortfall)` +7. Events & logs + * `event_bme_burn_act { owner, to, act_burn, akt_out, P_now, vault_paid: use_vault, minted: shortfall }` +8. Post-conditions + * ACT supply reduced by `act_burn`. + * AKT sent to `to` equals `akt_out`. + * Net AKT supply changes only if `shortfall > 0`. + +###### Failure cases + +* Oracle unhealthy → `ErrOracle`. +* Insufficient ACT → `ErrInsufficientACT`. + +##### Cross-cutting behavior (all three messages) + +* Oracle safety + * Use **median(TWAP\_osmosis, TWAP\_external)** with freshness checks. + * Reject if feeds disagree \> threshold (e.g., 1.5%) or stale. + +* Circuit breakers + * Monitor **Collateral Ratio (CR)** \= `(VaultAKT * P_ref) / OutstandingACT`. + * **Warn** below 0.95, **halt new mints** below 0.90 (settlements/refunds still allowed; governance may throttle refunds if CR is critically low). + +* Events/metrics to expose + * `OutstandingACT`, `VaultAKT`, `RemintCredits`, `CR`, `NetBurn24h`, `NetBurn30d`. + * Per-tx: emitted prices, amounts, and whether payout used vault vs. new mint. + +* Rounding rules + * Favor **tenant** on mint (round ACT **down** very slightly); favor **provider** on settlement (round AKT **down** very slightly) to avoid over-pay; the residuals accrue to the vault as dust and improve CR over time. + +* AuthZ / Feegrant + * Allow org billing wallets or the console to act on behalf of tenants (auto top-ups). + +#### One Concrete Flow + +* **Mint:** Tenant tops up USD 1,000 @ `P_mint = $1.14` + * Pull `akt_req = 877.192982`, send to `BME_VAULT` + * Mint `act_out = 1,000` ACT + * RemintCredits \+= 877.192982 + +* **Spend**: Settle later @ P\_settle \= $1.50 + * Burn `1,000` ACT + * Owe provider `akt_out = 666.666667` + * Pay entirely from vault (`use_vault = 666.666667`) + * `RemintCredits = 210.526315` remaining + * **Net AKT still out of circulation:** 210.526315 (which is the deflation) + +* **Refund** (if tenant cancels remaining $X credits) + * Same as spend: burn ACT, return AKT using vault first, mint only if needed. + +#### State +* `VaultAKT` (module account balance) and RemintCredits (book‑entry of burned AKT available to re‑issue). +* `TotalACT`, per‑account ACT balances. +* `CR` view. + +#### Invariants + +* `RemintCredits >= 0` except during a price‑crash path when circuit breaker logic governs shortfall; if `RemintCredits < 0`, only inflationary mint covers the difference (visible metric). + +*(Cosmos SDK’s [bank/mint](https://docs.cosmos.network/v0.46/modules/bank/) facilities support this pattern cleanly)* + +### Provider & Tenant UX + +#### Tenant (Console) + +1. Add funds → choose card or AKT. +2. If card: pay $X → console **buys AKT** on Osmosis, sends to vault; chain **mints X ACT**. +3. See **$ balance**, low‑balance alerts, auto‑top‑up (batching default). + +**Provider** +* Price as of today (USD‑first in UI). +* Receive **AKT** each epoch. +* Optional “auto‑stake N%” toggle. + +### Interoperability with AEP‑23 & migration + +* **AEP‑23 kept:** USD quoting, whitelisting, and the stable‑payment user journey in [docs](https://github.com/orgs/akash-network/discussions/930) +* **AEP‑23 changed:** **remove take‑rate** on stable settlements, and **disable USDC payouts** to providers; all payouts are in AKT. *The earlier [discussions](https://github.com/orgs/akash-network/discussions/930) recorded per‑currency take‑rates; we eliminate them here.* +* **Data migration:** existing escrowed stable balances in the console can be one‑time converted: buy AKT → burn → mint ACT at migration block’s oracle price. + +### Observability & Dashboard + +* **On‑chain:** `OutstandingACT`, `VaultAKT`, `RemintCredits`, `CR`, `NetBurn24h`, `NetBurn30d`. +* **Console:** “**AKT saved from circulation**” badge; “**Effective AKT purchased**”; runway days; min/top‑up size. +* **Provider:** AKT payout history with oracle prices. + +### Security & Attack surface + +* **Oracle manipulation:** use multi‑feed medianization with [TWAP](https://docs.osmosis.zone/overview/features/concentrated-liquidity); cap per‑block price move; pausable by governance multisig in emergencies. +* **Liquidity stress:** perform console buys via RFQ/aggregator (Osmosis primary) with max slippage. (Osmosis as Cosmos liquidity hub. +* **Module integrity:** [bank](https://docs.cosmos.network/v0.46/modules/bank) burn/mint auditing with supply invariants in end blockers. + +### Rollout plan + +1. **Testnet phase (4–6 weeks):** enable `x/bme` and `x/act` with faucets; fuzz settle/mint under varying volatility; run oracle chaos tests. +2. **Shadow launch:** mainnet with 25 bps spreads; circuit breakers armed; small governance‑seeded vault buffer. +3. **Full launch:** migrate console to ACT top‑ups; switch payouts to AKT‑only; deprecate USDC payouts (keep USDC **funding** path only via card → AKT buy → ACT). +4. **Post‑launch:** expose metrics; if CR drifts lower in stress, governance can nudge mint‑side bps to rebuild buffer **without taxing providers** or re‑introducing take‑rates. + +## Quick FAQ + +* **Is this deflationary?** Over any window where AKT appreciates between top‑up and settlement, yes—net burns occur. In all cases, outstanding ACT removes AKT from float until it’s paid out. +* **What if AKT dumps?** The seeded vault and circuit breakers cover shortfalls without taxing leases. Worst case, tenants pay AKT directly for a short window while CR recovers. +* **Does this harm providers?** No. Providers are paid AKT valued at the invoice’s USD amount using the settlement TWAP. No take‑rate applies to their payout. + +[^1]: AKT is trading at $1.14 with a circulating supply of 278,583,970 as of Sept 16, 2025 according to [CoinMarketCap](https://coinmarketcap.com/currencies/akash-network) +[^3]: Osmosis supports on‑chain [TWAP](https://docs.osmosis.zone/overview/features/concentrated-liquidity); Cosmos SDK supports custom bank/mint flows diff --git a/src/content/aeps/aep-78/README.md b/src/content/aeps/aep-78/README.md new file mode 100644 index 000000000..c88491073 --- /dev/null +++ b/src/content/aeps/aep-78/README.md @@ -0,0 +1,279 @@ +--- +aep: 78 +title: Enable CosmWasm Smart Contracts on Akash Network +author: Artur Troian (@troian) +status: Final +type: Standard +category: Core +roadmap: major +created: 2025-11-14 +estimated-completion: 2026-03-23 +--- + +## Summary + +This AEP proposes enabling CosmWasm smart contract functionality on Akash Network to unlock programmable decentralized cloud infrastructure, enabling automated resource management, advanced settlement mechanisms, and on-chain governance capabilities. This enhancement leverages Pyth Network as a price oracle to support AEP-76's Burn Mint Equilibrium (BME) mechanism and extends Akash's capabilities beyond simple compute marketplaces into a fully programmable cloud platform. + +## Abstract + +CosmWasm is a multi-chain smart contracting platform built for the Cosmos ecosystem that allows developers to write secure, performant smart contracts in Rust. Enabling CosmWasm on Akash Network will provide the foundation for building sophisticated decentralized applications (dApps), implementing automated marketplace logic, creating programmable payment channels, and enabling complex provider-tenant relationships. This proposal integrates with AEP-76's requirement for a reliable price oracle by incorporating Pyth Network's low-latency price feeds, ensuring accurate and manipulation-resistant pricing for the BME tokenomics model while opening the door to a rich ecosystem of on-chain applications. + +## Motivation + +### Current Limitations + +Akash Network currently operates as a decentralized compute marketplace where the core functionality is hardcoded into the blockchain's native modules. While this approach ensures stability and security, it creates several limitations: + +1. **Limited Programmability**: New marketplace features require network upgrades and governance approval, slowing innovation +2. **Rigid Settlement Mechanisms**: Payment and escrow logic cannot be easily modified or extended +3. **Manual Governance**: Complex decisions require off-chain coordination and manual execution +4. **Limited Ecosystem Integration**: Cannot natively interact with other protocols or leverage composable primitives +5. **Oracle Dependency**: AEP-76's BME mechanism requires reliable price oracles, which need smart contract infrastructure to function optimally +6. **Provider Incentives**: Limited ability to create sophisticated incentive mechanisms for providers + +### Benefits of CosmWasm Integration + +Enabling CosmWasm smart contracts on Akash Network provides transformational capabilities: + +**1. Programmable Cloud Infrastructure** +- Custom deployment automation and orchestration logic +- Conditional resource allocation based on on-chain conditions +- Dynamic pricing algorithms implemented as smart contracts +- Automated provider selection and load balancing + +**2. Advanced Settlement and Payment Systems** +- Programmable escrow contracts with custom release conditions +- Micropayment channels for real-time resource usage billing +- Multi-party payment splits and revenue sharing +- Integration with AEP-76's BME mechanism via smart contract-based burn/mint operations + +**3. Enhanced Price Oracle Integration** +- Smart contracts consuming Pyth Network price feeds for AKT/USD pricing +- Automated arbitrage mechanisms for the BME equilibrium +- Price-triggered actions for lease renewals and settlements +- Oracle failsafe mechanisms and price aggregation logic + +**4. Ecosystem Integration** +- Lending protocols for provider collateral and tenant deposits +- Liquid staking derivatives for staked AKT +- Cross-chain bridges and IBC-enabled applications +- Yield optimization strategies for network participants + +**5. Decentralized Governance Tools** +- On-chain voting mechanisms for network parameters +- Automated treasury management +- Provider slashing and dispute resolution contracts +- Community fund allocation and grant programs + +**6. Developer Ecosystem Growth** +- Attract Cosmos ecosystem developers already familiar with CosmWasm +- Enable third-party innovation without requiring core protocol changes +- Create opportunities for application-specific tooling and infrastructure +- Foster composability with other Cosmos chains running CosmWasm + +## Specification + +### Technical Architecture + +#### Core Components + +**1. CosmWasm Module Integration** + +The implementation requires integrating the `x/wasm` module from the CosmWasm stack into the Akash blockchain: + +``` +wasmd v0.61.6.0 or higher +- CosmWasm VM integration +- Wasm bytecode storage +- Smart contract instantiation and execution +- Gas metering and optimization +``` + +**2. Pyth Network Oracle Integration** + +As required by AEP-76, Pyth Network will serve as the primary price oracle. Pyth network has ready to use smart-contract implementing [oracle price feed](https://github.com/pyth-network/pyth-crosschain/blob/main/target_chains/cosmwasm/contracts/README.md) + + +#### Smart Contract Capabilities + +**Supported Features:** +- Contract upload, instantiation, execution, and migration will be supported via governance +- Inter-contract messaging (CosmWasm to CosmWasm) +- IBC contract support for cross-chain operations +- Query support for contract state +- Native token handling (AKT and ACT) +- Integration with native Akash modules (deployment, provider, market) + +**Gas and Fee Structure:** +- Gas costs for contract operations based on computational complexity +- Storage fees for contract code and state +- Execution fees for contract calls +- Gas limits and optimization requirements + +#### Security Considerations + +**1. Contract Upload Governance** +Initially, contract upload permissions should be restricted to governance-approved addresses to prevent malicious code deployment: + +``` +Parameters: +- code_upload_access: "Governance" | "Everybody" | "Nobody" +- instantiate_default_permission: "Everybody" | "Nobody" +``` + +**2. Contract Execution Limits** +- Maximum contract size: 800 KB (compressed) +- Maximum gas per transaction: 100M units +- Query gas limit: 10M units +- Maximum recursion depth: 5 levels + +**3. Oracle Security** +- Pyth Network provides cryptographically signed price updates +- Multi-source price aggregation to prevent manipulation +- Confidence intervals for price data quality +- Emergency circuit breakers for anomalous price feeds + +**4. Testing Requirements** +- Testnet deployment and community review before mainnet activation + +### Integration with AEP-76 + +This proposal directly supports AEP-76's Burn Mint Equilibrium mechanism by providing: + +1. **Smart Contract-Based BME Implementation**: The burn-mint logic can be implemented as auditable, upgradeable smart contracts rather than hardcoded protocol logic + +2. **Pyth Network Price Oracle**: CosmWasm enables integration with Pyth's low-latency price feeds, providing the reliable pricing infrastructure required for BME operations + +3. **Automated Arbitrage Mechanisms**: Smart contracts can monitor ACT/AKT price spreads and execute arbitrage automatically, helping maintain the BME equilibrium + +4. **Transparent Operations**: All burn-mint operations occur in observable smart contracts, increasing transparency and auditability + +5. **Extensibility**: Future improvements to the BME mechanism can be deployed as contract upgrades without requiring hard forks + +## Rationale + +### Why CosmWasm? + +1. **Cosmos Ecosystem Standard**: CosmWasm is the de facto smart contract platform for Cosmos SDK chains, ensuring compatibility and developer familiarity +2. **Security**: Rust's memory safety guarantees and CosmWasm's design prevent common smart contract vulnerabilities (reentrancy attacks, overflow/underflow issues) +3. **Performance**: WebAssembly provides near-native execution speed while maintaining portability +4. **IBC Native**: CosmWasm has first-class support for Inter-Blockchain Communication, enabling cross-chain applications +5. **Proven Track Record**: Multiple Cosmos chains (Juno, Neutron, Stargaze, Terra 2.0) successfully run CosmWasm with significant TVL and activity + +### Why Pyth Network? + +As required by AEP-76, Pyth Network is the optimal choice for price oracle functionality: + +1. **Low Latency**: 400ms price updates enable real-time pricing for BME operations +2. **High Quality Data**: Price feeds sourced from major exchanges, market makers, and trading firms +3. **Cosmos Integration**: Already deployed on Osmosis and other Cosmos chains with proven reliability +4. **Security**: Cryptographically signed price updates with confidence intervals and multi-source aggregation +5. **Comprehensive Coverage**: Supports AKT/USD and other necessary price pairs for the Akash ecosystem + +### Alternative Approaches Considered + +**1. Native Module Implementation** +- **Pros**: Better performance, simpler architecture +- **Cons**: Requires hard forks for updates, limits developer innovation, no composability with broader ecosystem + +**2. EVM Compatibility Layer** +- **Pros**: Access to Ethereum developer community and tooling +- **Cons**: Less optimized for Cosmos, different security model, not standard for Cosmos ecosystem + +**3. Other Oracle Solutions (Chainlink, Band Protocol)** +- **Pros**: Established reputation, wide adoption +- **Cons**: Less Cosmos-native, higher latency, AEP-76 specifically targets Pyth Network + +## Backwards Compatibility + +This proposal is fully backwards compatible: + +- Existing Akash modules and functionality remain unchanged +- CosmWasm operates as an additional module alongside existing features +- No changes to current deployment, provider, or market modules +- Existing deployments continue to function without modification +- Opt-in functionality for users who want smart contract features + +## Test Cases + +### Integration Tests + +1. Deploy Pyth contract on testnet +2. Deploy BME burn-mint contracts +3. Execute full burn-mint cycle with price oracle query +4. Verify correct ACT minting and AKT burning +5. Test automated arbitrage contract behavior +6. Validate settlement contract execution with real deployments + +### Testnet Deployment + +- Public testnet with CosmWasm enabled +- Developer documentation and examples +- Bug bounty for critical vulnerabilities +- Community testing period: minimum 3 months before mainnet + +## Security Considerations + +### Contract Security + +1. **Upgrade Path**: Use contract migration features carefully with governance oversight +2. **Permission System**: Initially restrict contract uploads to governance-approved addresses +3. **Gas Limits**: Implement appropriate gas limits to prevent DoS attacks + +### Oracle Security + +1. **Price Manipulation**: Pyth's multi-source aggregation and confidence intervals mitigate manipulation risks +2. **Data Staleness**: Circuit breakers trigger if price data exceeds freshness threshold +3. **Fallback Mechanisms**: Multiple oracle sources for critical operations +4. **Governance Override**: Emergency governance actions can pause BME operations if oracle issues detected + +## Implementation + +### Required Changes + +**1. Core Node Software** +``` +akash-network/node: +- Integrate wasmd v0.61.6 +- Add x/wasm module to app.go +- Configure wasm parameters +- Enable contract upload governance +``` + +**2. CosmWasm Contracts** +``` +New Repository: akash-network/contracts +- Pyth oracle consumer +``` + +**3. Documentation** +``` +akash-network/docs: +- CosmWasm developer guide +- Contract deployment tutorial +- Oracle integration guide +- Security best practices +- Example contracts and templates +``` + +**4. Testing Infrastructure** +``` +- Testnet deployment scripts +- Integration test suite +- Performance benchmarks +- Security testing framework +``` + +## References + +- [CosmWasm Documentation](https://docs.cosmwasm.com/) +- [CosmWasm GitHub](https://github.com/CosmWasm/cosmwasm) +- [Pyth Network Documentation](https://docs.pyth.network/) +- [Pyth Network on Osmosis](https://www.pyth.network/blog/pyth-launches-price-oracles-on-osmosis) +- [AEP-76: Burn Mint Equilibrium](https://akash.network/roadmap/aep-76/) +- [Cosmos SDK Documentation](https://docs.cosmos.network/) +- [Akash Network Documentation](https://docs.akash.network/) + +## Copyright + +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-79/README.md b/src/content/aeps/aep-79/README.md new file mode 100644 index 000000000..ab548a34b --- /dev/null +++ b/src/content/aeps/aep-79/README.md @@ -0,0 +1,204 @@ +--- +aep: 79 +title: "Akash on Shared Security" +author: Greg Osuri (@gosuri) +status: Final +type: Standard +category: Core +created: 2025-12-15 +updated: 2025-12-15 +estimated-completion: 2026-12-31 +roadmap: major +--- + +## Abstract + +This RFP seeks proposals from established Layer 1 protocols to become the shared-security provider for the Akash Network, a decentralized cloud computing marketplace on the Cosmos SDK. The goal is to transition from Akash's sovereign chain to a shared-security model to address high capital inefficiency from AKT staking and excessive operational overhead. This move will adopt a pay-per-use security model, reducing the liquidity burden and allowing the team to focus on product innovation, particularly for GPU-intensive AI workloads. The partner must ensure scalable, robust, and decentralized security, maintaining strong IBC interoperability. Proposals must detail the L1's Security Model, Technical Integration, Scalability, Governance, Economic/Legal Considerations, and Ecosystem Profile to enable Akash to leverage external security while preserving application sovereignty. + +## Motivation + +Akash Network is migrating from its sovereign chain to a shared-security framework to enhance capital efficiency and reduce operational overhead. This move frees staked AKT for marketplace growth and offloads L1 maintenance and security, allowing developers to focus solely on the `AkashApp` (marketplace logic). The goal is to achieve equivalent or better security at a lower cost, accelerating decentralized cloud innovation. + +## Introduction + +Akash is a **decentralized cloud computing marketplace** that facilitates peer-to-peer trading of computing power. By eliminating middlemen, users can buy and sell resources at a fraction of the cost of big cloud providers. + +Often regarded as the first decentralized cloud and category creator for [DePIN](https://messari.io/report/the-depin-sector-map), the Akash community and code are widely regarded for their deep open-source and decentralization values. Akash is an [open‑source](https://github.com/akash-network) protocol built on the Cosmos SDK and uses Tendermint PoS consensus. Marketplace activity (requests, bids and leases) is stored on‑chain and paid for with the AKT token. The network operates a reverse auction; providers compete to offer resources and often deliver compute **at a fraction of the cost** of centralized cloud. This decentralized model has attracted a **large community of over 500 contributors**. + +Akash has traditionally run on its **own sovereign chain**. Operating a sovereign chain has two major drawbacks: + +1. **Capital efficiency** – maintaining an independent validator set requires significant liquidity because large amounts of AKT must be staked to secure the chain. Liquidity tied up in security could instead be used to support the application. +2. **Operational and technical overhead** – maintaining a Layer 1 (L1) blockchain solely for a single application demands continuous upgrades, infrastructure management and security expertise. This detracts from product‑focused innovation. + +To address these issues, Akash is evaluating a **shared‑security model**. In this model Akash would lease security from another L1 rather than running its own validator set. For example, under **Cosmos Interchain Security** a “consumer chain” leases the **exact same validator set** as the Cosmos Hub; in return, validators receive a share of the consumer chain’s transaction fees. The cost of attacking the consumer chain becomes the same as attacking the Hub, lowering the barrier for launching secure chains. **Polkadot’s parachain model** takes a similar approach: parachains must communicate state transitions to the **Polkadot Relay Chain**, and the Relay Chain’s validators secure all parachains. By concentrating security in one set of validators, parachains don’t need their own security infrastructure. **Celestia** offers modular “data availability” and consensus; rollups submit their data to Celestia and **pay for the inclusion of data.** Fees are only priced on data rather than execution, providing **lower‑cost blockspace**. These examples show that shared‑security can provide robust security and scalability while reducing operational overhead. + +## Objectives of this RFP + +Akash seeks proposals from foundations or teams behind established Layer 1 protocols interested in serving as the **shared‑security provider** for the Akash Network. Proposals should explain how the candidate L1 will allow Akash to retain sovereignty over its application logic while outsourcing security. The ideal solution should: + +1. **Reduce token‑liquidity burden.** Validators must normally stake AKT to secure the chain. We seek a model where security is paid for on a per‑use basis rather than continuously locking large amounts of AKT. +2. **Minimize technical and operational overhead.** The new model should reduce the need for Akash to maintain its own validator set and blockchain infrastructure. The L1 should provide robust consensus, data availability and slashing mechanisms. +3. **Preserve decentralization and openness.** Akash is a community‑owned, open‑source network and the selected L1 must share similar values. The integration should not introduce centralization risks or onerous permissioning. +4. **Provide scalability and cost predictability.** Akash’s workloads, particularly GPU‑intensive AI deployments, are growing rapidly. The security provider must accommodate high throughput without prohibitive costs or slot scarcity. +5. **Enable interoperability.** Akash is built on Cosmos and interacts with other chains via IBC. Proposals should highlight how the L1 integrates with Cosmos/IBC or provides equivalent interoperability. + +## Scope of Proposal + +Foundations responding to this RFP should provide detailed information addressing the following areas. + +### Security Model and Mechanism + +* **Shared‑security design.** Explain how your protocol’s shared‑security mechanism works. For instance, Cosmos “replicated security” uses the same validator set on the provider chain to validate blocks on a consumer chain, while Polkadot’s parachains rely on the Relay Chain’s validators and Celestia provides data availability for rollups that pay for data inclusion. Indicate whether the mechanism is **enshrined** (all chains use shared security by default) or **opt‑in**, and discuss potential limitations (e.g., Polkadot’s limited number of parachain slots or Cosmos’s current limit on the number of consumer chains). +* **Economic model.** Detail how Akash will pay for security. Provide fee structures (e.g., transaction‑fee sharing, staking requirements, slot auctions or data‑availability fees) and how costs scale with usage. Highlight whether fees are based on data size (as in Celestia’s pricing) or fixed regardless of usage. +* **Validator incentives and slashing.** Describe how your protocol incentivizes validators and punishes misbehavior. Cosmos Interchain Security slashes validators’ stakes on the provider chain if they misbehave on consumer chains; similar mechanisms should be explained. + +### Technical Integration + +* **Interoperability with Cosmos and IBC.** Given Akash’s deep integration with the Cosmos ecosystem, proposals should discuss how the L1 connects to Cosmos or supports IBC‑equivalent interoperability. For example, Cosmos consumer chains update their validator sets via IBC packets. +* **Execution environment support.** Describe whether Akash will deploy as a sovereign rollup or consumer chain and whether the L1 supports general purpose execution. Celestia only provides data availability and ordering, requiring a separate settlement layer; Polkadot’s Relay Chain executes parachain transitions. + +* **Migration path.** Provide a roadmap for migrating Akash from its sovereign chain to your shared‑security model. Include timelines, required governance approvals and any necessary code modifications. + +### Scalability and Performance + +* **Throughput and finality.** Provide metrics on your network’s throughput (transactions per second, data throughput) and block finality times. Explain how these metrics will impact Akash’s workload, particularly high‑density GPU leases. +* **Capacity limitations.** Note any hard limits on the number of consumer chains or parachains (e.g., Polkadot’s parachain slot limit or Cosmos’s current capacity for 5–10 consumer chains). Describe plans for scaling beyond these limits. + +### Governance and Community Alignment + +* **Governance model.** Outline how protocol decisions (e.g., changes to shared‑security parameters, upgrades) are made. Akash values community‑led governance and open‑source principles. The selected L1 should demonstrate transparency and decentralized decision‑making. +* **Support and collaboration.** Describe the foundation’s commitment to onboarding Akash. This includes technical support, grant funding, marketing collaboration and involvement of your community. +* **Community Alignment:** Describe the community's commitment to the values of open-source and deep decentralization. Please include profiles of X key community contributors. + +### Economic and Legal Considerations + +* **Cost of adoption.** Provide estimated costs or required stake for Akash to join your network. Include any slot‑auction costs, bonding requirements or initial collateral. +* **Token‑economic implications.** Discuss how integration affects AKT token demand and potential need to hold your protocol’s token for security (e.g., DOT for Polkadot). Highlight whether cross‑chain fees are paid in AKT, your native token or other assets. +* **Legal and regulatory issues.** Explain any regulatory considerations associated with shared‑security or cross‑chain operations. Akash must ensure compliance with applicable U.S. regulations. + +### Ecosystem & Liquidity + +* **Liquidity Profile:** Provide a thorough assessment of the chain’s liquidity profile, including market depth, average bid-ask spread, and trading volumes. Key considerations include the ecosystem’s presence on major Centralized Exchanges (CEXs), detailing trading pairs and average daily volume to assess global reach, and its support on Decentralized Exchanges (DEXs). For DEXs, please detail the size and depth of primary liquidity pools, Total Value Locked (TVL), and any details around liquidity incentives. + A strong liquidity profile, characterized by deep order books on CEXs and substantial pools on DEXs, is essential for maintaining price stability and attracting institutional investment for Akash. +* **Ecosystem Investment and Growth Support:** Provide your framework to foster growth through strategic project funding. This includes detailing specific investment strategies, such as grants, venture capital participation, and direct financial backing for promising decentralized applications and infrastructure developments. Furthermore, clearly identify and profile the key investors, venture capital firms, and institutional partners who are committed to the long-term success and expansion of the ecosystem. +* **Institutional Interest:** Describe institutional interests towards the ecosystem, particularly those that are attractive to stable yields backed by real-world assets. Describe how your community can help Akash gain such institutional interest. + +## Technical Overview + +Akash Network is a decentralized cloud computing marketplace built on the Cosmos SDK, implementing a peer-to-peer network for leasing computing resources. The network consists of tenants seeking computing capacity and providers offering resources, coordinated through blockchain-based smart contracts. + +### Core Application Architecture + +#### AkashApp Structure + +The network is implemented as `AkashApp`, extending Cosmos SDK's `BaseApp` and implementing the ABCI application interface. The application contains: + +- **BaseApp**: Core consensus and state management +- **App**: Container for keepers, module manager, and codec +- **Codec**: Protobuf and amino codecs for serialization +- **Module Manager**: Coordinates all blockchain modules + +#### Keeper Architecture + +State management is handled through a two-tier keeper system: + +**Special Keepers** (no dependencies): +- Params, ConsensusParams, Upgrade keepers + +**Normal Keepers** (dependency-ordered): +- **Cosmos Core**: Auth, Bank, Staking, Distribution, Slashing, Gov, IBC +- **Akash-Specific**: Escrow, Deployment, Market, Provider, Audit, Cert, Take + +### Module System + +#### Core Modules + +The marketplace functionality is implemented through six Akash-specific modules: + +1. **Deployment**: Manages workload deployment lifecycle +2. **Market**: Handles orders, bids, and leases +3. **Provider**: Manages provider registration and attributes +4. **Escrow**: Processes payments and deposits +5. **Audit**: Handles provider auditing +6. **Cert**: Manages certificates +7. **Take**: Processes marketplace fees + +### Module Initialization + +Modules follow strict initialization order to satisfy dependencies: + +- Foundation modules (auth, bank, staking) +- Governance modules (slashing, gov) +- System modules (upgrade, mint, crisis) +- Cross-chain modules (IBC, evidence, transfer) +- Akash foundation modules (cert, take, escrow) +- Marketplace modules (deployment, provider, market) + +### ABCI Lifecycle + +The application implements the full ABCI lifecycle for block processing: + +1. **InitChain**: Initializes genesis state from all modules +2. **PreBlocker**: Handles upgrades and account preparation +3. **BeginBlocker**: Processes block-start logic with specific module ordering +4. **EndBlocker**: Finalizes block processing +5. **Precommitter**: Final cleanup before state commit + +### Marketplace Mechanics + +#### Order Flow + +1. Tenants create deployment orders specifying resource requirements +2. Providers bid on orders with offered resources and pricing +3. Market module matches orders with bids +4. Escrow module handles payment processing +5. Deployment module manages container lifecycle + +#### State Management + +- **KV Stores**: Persistent state for accounts, balances, validators +- **Transient Stores**: Per-block temporary computation state +- **Memory Stores**: Non-consensus data like capability keys + +### Network Configuration + +#### Token Economics + +- Native token: AKT (Akash Token) +- Minimum validator commission: 5% +- Governance deposit requirements: 40% of MinDeposit +- Unbonding period: 2 weeks + +#### Consensus Parameters + +- Block time: \~5 seconds +- Signed blocks window: 30,000 blocks (\~41 hours) +- Minimum liveness: 5% +- Double sign slashing: 5% +- Downtime slashing: 0% + +### Technical Implementation + +#### Build System + +- Written in Go (Golang 1.21.0+) +- Uses Makefile for build automation +- Single binary `akash` containing full node and client functionality + +#### API Interfaces + +- gRPC services for node communication +- REST gateway for client access +- CLI interface for node operations + +### Notes + +This summary focuses on the [core](https://github.com/akash-network/node) blockchain implementation. The actual deployment of workloads happens through [provider-side](https://github.com/akash-network/provider) infrastructure that interfaces with the blockchain but is not covered in this codebase. The network has undergone multiple upgrades to enhance performance and add features like GPU support and improved fee mechanisms.To learn further into and interact with the code base, check out the [node](https://deepwiki.com/akash-network/node) and [provider](https://deepwiki.com/akash-network/provider) deepwikis. + +## Submission + +Overclock Labs, the core developer coordinating this proposal, will serve as the primary evaluator. Interested parties should contact Greg Osuri on [X](https://x.com/gregosuri) or [Telegram](https://t.me/gregosuri) . + +## Copyright + +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-8/README.md b/src/content/aeps/aep-8/README.md index 9d3bf8657..d3798e3fc 100644 --- a/src/content/aeps/aep-8/README.md +++ b/src/content/aeps/aep-8/README.md @@ -1,7 +1,7 @@ --- aep: 8 title: "Mainnet 1: Security" -author: Greg Osuri (@gosuri), Adam Bozanich (@boz) +author: Greg Osuri (@gosuri) Adam Bozanich (@boz) status: Final type: Standard category: Core diff --git a/src/content/aeps/aep-80/README.md b/src/content/aeps/aep-80/README.md new file mode 100644 index 000000000..675d5103c --- /dev/null +++ b/src/content/aeps/aep-80/README.md @@ -0,0 +1,509 @@ +--- +aep: 80 +title: "On-Chain Oracle Module" +author: Artur Troian (@troian) +status: Final +type: Standard +category: Core +created: 2026-03-06 +estimated-completion: 2026-03-23 +roadmap: major +requires: 76 +--- + +## Abstract + +This proposal introduces `x/oracle`, a native Cosmos SDK module that provides trustworthy, aggregated price data for on-chain consumers. The module accepts price submissions from authorized CosmWasm contracts, calculates Time-Weighted Average Prices (TWAP), enforces staleness and deviation health checks, and exposes aggregated prices to other modules via keeper queries. A custom CosmWasm querier and message filter allow smart contracts to both read oracle parameters (including Wormhole guardian sets) and submit verified prices — forming the on-chain backbone for the Burn-Mint Equilibrium ([AEP-76](../aep-76)) and external oracle integrations such as Pyth ([AEP-81](../aep-81)). + +## Motivation + +AEP-76 requires a reliable AKT/USD TWAP price for every BME mint and settlement operation. This price must be: + +1. **Multi-source** — aggregated from multiple independent oracle providers to resist manipulation. +2. **Time-weighted** — smoothed over a configurable window to dampen short-term volatility. +3. **Health-checked** — automatically flagged when sources are stale or diverge beyond acceptable bounds. +4. **Governance-controlled** — all parameters (authorized sources, TWAP window, deviation limits) updatable via governance proposals. +5. **CosmWasm-accessible** — smart contracts (e.g., Pyth relay) must be able to submit prices and query oracle configuration without duplicating state. + +No existing Cosmos SDK module provides this combination. The `x/oracle` module fills this gap as a purpose-built price aggregation layer. + +## Specification + +### Module Identity + +| Property | Value | +|-------------|------------| +| Module name | `oracle` | +| Store key | `oracle` | +| Router key | `oracle` | + +### Data Model + +#### DataID + +Uniquely identifies a price pair. + +```protobuf +message DataID { + string denom = 1; // Asset denomination (e.g., "uakt") + string base_denom = 2; // Base denomination (e.g., "usd") +} +``` + +#### PriceDataID + +Identifies a price from a specific source. + +```protobuf +message PriceDataID { + uint32 source = 1; // Oracle provider index (assigned sequentially) + string denom = 2; + string base_denom = 3; +} +``` + +#### PriceDataRecordID + +Complete price record identifier including block height, enabling range queries. + +```protobuf +message PriceDataRecordID { + uint32 source = 1; + string denom = 2; + string base_denom = 3; + int64 height = 4; // Block height when recorded +} +``` + +#### PriceDataState + +The price value and its publish timestamp. + +```protobuf +message PriceDataState { + string price = 1; // cosmos.Dec (must be positive) + google.protobuf.Timestamp timestamp = 2; // Publisher timestamp +} +``` + +#### AggregatedPrice + +The computed aggregate output stored per `DataID` at the end of each block. + +```protobuf +message AggregatedPrice { + string denom = 1; + string twap = 2; // Time-weighted average price + string median_price = 3; + string min_price = 4; + string max_price = 5; + google.protobuf.Timestamp timestamp = 6; // Computation time + uint32 num_sources = 7; // Contributing sources + uint64 deviation_bps = 8; // (max - min) * 10000 / min +} +``` + +#### PriceHealth + +Health status computed alongside aggregation. + +```protobuf +message PriceHealth { + string denom = 1; + bool is_healthy = 2; // has_min_sources AND deviation_ok + bool has_min_sources = 3; + bool deviation_ok = 4; + uint32 total_sources = 5; + uint32 total_healthy_sources = 6; + repeated string failure_reason = 7; +} +``` + +### KV Store Layout + +All state is stored under module prefix `0x11` using `collections.Map` with custom codecs for composite keys: + +| Prefix | Key Type | Value Type | Description | +|--------------|---------------------|-------------------|-----------------------------------------| +| `0x11, 0x00` | `PriceDataRecordID` | `PriceDataState` | All price records, ordered by height | +| `0x11, 0x01` | `PriceDataID` | `int64` | Latest recorded height per source/denom | +| `0x11, 0x02` | `DataID` | `AggregatedPrice` | Current aggregated price per denom pair | +| `0x11, 0x03` | `DataID` | `PriceHealth` | Current health status per denom pair | +| `0x12, 0x00` | — | `uint64` | Source ID sequence counter | +| `0x12, 0x02` | `string` | `uint32` | Source address to numeric ID mapping | +| `0x09` | — | `Params` | Module parameters | + +Custom codecs encode composite keys with length-prefixed strings and big-endian integers, enabling efficient range iteration by source and height. + +### Parameters + +```protobuf +message Params { + repeated string sources = 1; // Authorized source addresses + uint32 min_price_sources = 2; // Min sources for valid price + int64 max_price_staleness_blocks = 3; // Max age in blocks + int64 twap_window = 4; // TWAP window in blocks + uint64 max_price_deviation_bps = 5; // Max deviation (basis points) + repeated google.protobuf.Any feed_contracts_params = 6; // Contract-specific config +} +``` + +#### Defaults + +| Parameter | Default | Description | +|------------------------------|---------|---------------------------------------| +| `sources` | `[]` | No sources authorized initially | +| `min_price_sources` | `1` | At least 1 source required | +| `max_price_staleness_blocks` | `60` | ~6 minutes at 6s blocks | +| `twap_window` | `180` | ~18 minutes at 6s blocks | +| `max_price_deviation_bps` | `150` | 1.5% maximum deviation | + +#### Feed Contract Parameters + +The `feed_contracts_params` field uses `google.protobuf.Any` to store typed configuration for different oracle integrations: + +```protobuf +message PythContractParams { + string akt_price_feed_id = 1; // Pyth price feed ID for AKT/USD +} + +message WormholeContractParams { + repeated string guardian_addresses = 1; // 20-byte hex-encoded Ethereum addresses +} +``` + +These are read by the custom CosmWasm querier (see [CosmWasm Integration](#cosmwasm-integration)) so that smart contracts can access configuration without duplicating it in contract state. + +### Messages + +#### MsgAddPriceEntry + +Submits a new price from an authorized source. + +```protobuf +service Msg { + rpc AddPriceEntry(MsgAddPriceEntry) returns (MsgAddPriceEntryResponse); + rpc UpdateParams(MsgUpdateParams) returns (MsgUpdateParamsResponse); +} + +message MsgAddPriceEntry { + string signer = 1; // Authorized source address + DataID id = 2; // Price pair (denom + base_denom) + PriceDataState price = 3; // Price value + timestamp +} +``` + +**Validation rules:** + +1. `signer` must be in `Params.sources`. +2. `id.denom` must be `"uakt"` and `id.base_denom` must be `"usd"` (only AKT/USD supported currently). +3. `price.price` must be positive. +4. `price.timestamp` must be newer than the last recorded price from the same source. +5. For the first price from a source, the timestamp must not be older than 12 seconds from the current block time. + +On success, the price is stored with a `PriceDataRecordID` keyed to the current block height, the latest height tracker is updated, and an `EventPriceData` event is emitted. + +#### MsgUpdateParams + +Governance-only message to update module parameters. + +```protobuf +message MsgUpdateParams { + string authority = 1; // x/gov module account + Params params = 2; +} +``` + +When sources are updated, each new source address is assigned a unique sequential `uint32` ID used as the source field in price record keys. + +### Source ID Assignment + +Each authorized source address receives a unique `uint32` identifier, assigned from a monotonically increasing sequence. This numeric ID is used in all KV store keys instead of the full address, enabling compact storage and efficient range queries. Source IDs are assigned when the source first appears in a `Params.sources` update and are never reused. + +### TWAP Calculation + +The TWAP is calculated per-source over a configurable block window using block-height weighting. + +**Algorithm:** `calculateTWAPBySource(ctx, source, denom, windowBlocks)` + +1. Compute `startHeight = currentHeight - windowBlocks`. +2. Retrieve all price records for the source within `[startHeight, currentHeight]` via range iteration on `PriceDataRecordID`. +3. For each data point `i`: + - If first data point: `weight_i = currentHeight - height_i` + - Otherwise: `weight_i = height_i - height_{i-1}` +4. `weightedSum = sum(price_i * weight_i)` +5. `TWAP = weightedSum / totalWeight` + +Returns `ErrTWAPZeroWeight` if no data falls within the window. + +### EndBlocker — Price Aggregation + +At the end of every block, the module runs aggregation: + +1. **Collect latest prices** — iterate `latestPrices` map, group by `DataID`. +2. **Filter stale prices** — discard any price whose timestamp is older than `currentBlockTime - (maxPriceStalenessBlocks * 6s)`. +3. **Calculate per-source TWAP** — for each non-stale source, compute TWAP over the configured window. Skip sources where TWAP calculation fails. +4. **Compute aggregates:** + - `twap` = mean of all source TWAPs + - `median_price` = median of source prices + - `min_price`, `max_price` = extremes + - `deviation_bps` = `(max_price - min_price) * 10000 / min_price` +5. **Set health status:** + - `has_min_sources` = `num_sources >= min_price_sources` + - `deviation_ok` = `deviation_bps <= max_price_deviation_bps` + - `is_healthy` = `has_min_sources AND deviation_ok` + - Record failure reasons if unhealthy +6. **Store results** — write `AggregatedPrice` and `PriceHealth` to their respective maps. Only store the aggregated price if health check passes. + +### Queries + +```protobuf +service Query { + // Historical price data with pagination + rpc Prices(QueryPricesRequest) returns (QueryPricesResponse); + // Module parameters + rpc Params(QueryParamsRequest) returns (QueryParamsResponse); + // Aggregated price and health for a denom + rpc AggregatedPrice(QueryAggregatedPriceRequest) returns (QueryAggregatedPriceResponse); + // Price feed configuration + rpc PriceFeedConfig(QueryPriceFeedConfigRequest) returns (QueryPriceFeedConfigResponse); +} +``` + +| Query | Filters | Returns | +|-------------------|--------------------------------------------|--------------------------------------| +| `Prices` | asset_denom, base_denom, height (optional) | Paginated `PriceData` list | +| `AggregatedPrice` | denom | `AggregatedPrice` + `PriceHealth` | +| `PriceFeedConfig` | denom | Feed config (Pyth feed ID, enabled) | +| `Params` | — | Current `Params` | + +**Denom normalization:** queries accept micro-denoms (`uakt`, `uact`) and normalize them to base denoms (`akt`, `act`) for internal lookup. ACT is hardcoded to return a price of 1 USD. + +#### REST Endpoints + +| Method | Path | +|--------|---------------------------------------------| +| GET | `/akash/oracle/v1/prices` | +| GET | `/akash/oracle/v1/params` | +| GET | `/akash/oracle/v1/aggregated_price/{denom}` | +| GET | `/akash/oracle/v1/price_feed_config/{denom}`| + +### Events + +```protobuf +// Emitted on successful price submission +message EventPriceData { + string source = 1; // Source address + DataID id = 2; + PriceDataState data = 3; +} + +// Emitted when a source approaches staleness +message EventPriceStaleWarning { + string source = 1; + DataID id = 2; + int64 last_height = 3; + int64 blocks_to_stall = 4; +} + +// Emitted when a source becomes stale +message EventPriceStaled { + string source = 1; + DataID id = 2; + int64 last_height = 3; +} + +// Emitted when a stale source recovers +message EventPriceRecovered { + string source = 1; + DataID id = 2; + int64 height = 3; +} +``` + +### Errors + +| Error | Condition | +|------------------------------|-------------------------------------------------------| +| `ErrUnauthorizedWriterAddress` | Signer not in `Params.sources` | +| `ErrInvalidTimestamp` | Timestamp older than existing or too far from block time | +| `ErrPriceStalled` | Price data is stale | +| `ErrTWAPZeroWeight` | No price data within TWAP window | +| `ErrPriceEntryExists` | Duplicate price entry | +| `ErrInvalidFeedContractParams` | Malformed feed contract params | +| `ErrInvalidFeedContractConfig` | Invalid feed contract configuration | + +### CosmWasm Integration + +The oracle module exposes a custom querier and message filter for CosmWasm contracts, enabling smart contracts to interact with the oracle without custom SDK message types. + +#### Custom Querier + +Registered as a `QueryPlugins.Custom` handler in the CosmWasm keeper, allowing contracts to issue custom queries: + +```go +// Registered in app setup +wasmkeeper.WithQueryPlugins(&wasmkeeper.QueryPlugins{ + Custom: wasmbindings.CustomQuerier(app.Keepers.Akash.Oracle), +}) +``` + +**Supported queries:** + +| Query Type | Request | Response | +|----------------|----------------------|-----------------------------------------| +| `oracle_params` | `OracleParamsQuery{}` | Oracle params with Pyth/Wormhole config | +| `guardian_set` | `GuardianSetQuery{}` | Guardian addresses (base64) + expiration | + +The `oracle_params` response includes typed Pyth and Wormhole contract parameters unpacked from the `feed_contracts_params` Any types. The `guardian_set` response converts hex guardian addresses to base64 encoding for contract consumption. + +#### Message Filter + +CosmWasm contracts are restricted to a single protobuf `Any` message type: + +``` +/akash.oracle.v1.MsgAddPriceEntry +``` + +All other `Any` message types (staking, distribution, governance, IBC, bank burns) are rejected. This ensures that oracle source contracts can only submit price data and cannot perform other privileged operations. + +**Data flow — price submission from contract:** + +``` +CosmWasm Contract (e.g., Pyth relay) + │ sends protobuf Any: MsgAddPriceEntry + ▼ +Message Filter (wasm/keeper/msg_filter.go) + │ validates message type = MsgAddPriceEntry + ▼ +Oracle Handler (x/oracle/handler/server.go) + │ validates source, denom, price, timestamp + ▼ +Oracle Keeper + │ stores PriceDataRecord, updates latest height + ▼ +EndBlocker aggregation (every block) +``` + +**Data flow — parameter query from contract:** + +``` +CosmWasm Contract (e.g., Wormhole verifier) + │ sends custom query: { "guardian_set": {} } + ▼ +Custom Querier (wasm/bindings/custom_querier.go) + │ reads x/oracle params, unpacks WormholeContractParams + ▼ +Returns guardian addresses as base64 +``` + +### Genesis State + +```protobuf +message GenesisState { + Params params = 1; + repeated PriceData prices = 2; + repeated LatestHeight latest_height = 3; +} +``` + +Default genesis initializes with `DefaultParams()` and empty price history. + +### CLI + +#### Transaction Commands + +``` +akash tx oracle feed [asset-denom] [base-denom] [price] [timestamp] +``` + +Submits a `MsgAddPriceEntry`. The timestamp must be in RFC3339Nano format. + +#### Query Commands + +``` +akash query oracle prices [--asset-denom X] [--base-denom Y] [--height Z] +akash query oracle aggregated-price [denom] +akash query oracle price-feed-config [denom] +akash query oracle params +``` + +## Rationale + +### TWAP Over Spot Price + +Using a Time-Weighted Average Price rather than spot prices protects against flash manipulation. A single anomalous price update is diluted across the entire TWAP window, making it economically infeasible to manipulate the oracle for BME operations. + +### Block-Height Weighting + +The TWAP uses block heights rather than wall-clock time for weighting. This is simpler and deterministic — all validators compute identical results without clock synchronization concerns. + +### EndBlocker Aggregation + +Computing aggregates at the end of every block ensures that consumers always read a consistent, up-to-date price. Lazy computation on query would introduce inconsistency across nodes. + +### Guardian Addresses in Oracle Params + +Storing Wormhole guardian addresses in `x/oracle` params rather than in the Wormhole contract state ensures that Akash governance controls the trust model. The custom querier bridges this data to the contract layer without state duplication. + +### Single Message Type Filter + +Restricting CosmWasm contracts to `MsgAddPriceEntry` minimizes the attack surface. Oracle source contracts are purpose-built and should not need access to staking, governance, or bank operations. + +### ACT Hardcoded to 1 USD + +Since ACT is a dollar-pegged compute credit (AEP-76), the oracle returns a fixed 1 USD price for ACT queries. This avoids circular dependencies where ACT pricing would need its own oracle feed. + +## Security Considerations + +### Source Authorization + +Only addresses explicitly listed in `Params.sources` can submit prices. Adding or removing sources requires a governance proposal. Each source is assigned a unique numeric ID that is never reused, preventing ID-reuse attacks. + +### Timestamp Validation + +New prices must have a timestamp strictly newer than the last price from the same source. The first price from a source must be within 12 seconds of the current block time. This prevents replay attacks and backdated price injection. + +### Staleness Protection + +Prices older than `max_price_staleness_blocks` are filtered out during aggregation. If all sources become stale, no aggregated price is produced and `PriceHealth.is_healthy` becomes false. Consumers (e.g., `x/bme`) must check health before using prices. + +### Deviation Bounds + +If the spread between min and max source prices exceeds `max_price_deviation_bps`, the health check fails. This detects compromised or malfunctioning sources before the bad data can affect BME operations. + +### Message Filter Isolation + +CosmWasm contracts authorized as oracle sources can only execute `MsgAddPriceEntry`. All other Cosmos SDK message types are blocked at the WASM keeper layer, limiting the blast radius of a compromised contract. + +### Governance-Only Parameter Updates + +All parameter changes — including adding/removing sources, adjusting TWAP window, and updating guardian addresses — require `x/gov` authority. No individual account can unilaterally modify oracle behavior. + +## Backward Compatibility + +This is a new module with no existing state to migrate. It does not modify any existing modules. Consumers opt in by querying the oracle keeper. + +## Implementations + +| Component | Location | +|----------------|---------------------------------------------------------| +| Keeper | `node/x/oracle/keeper/` | +| Handler | `node/x/oracle/handler/` | +| Module | `node/x/oracle/module.go` | +| Custom Querier | `node/x/wasm/bindings/custom_querier.go` | +| Query Types | `node/x/wasm/bindings/akash_query.go` | +| Message Filter | `node/x/wasm/keeper/msg_filter.go` | +| Proto | `chain-sdk/proto/node/akash/oracle/v1/` | +| Go Types | `chain-sdk/go/node/oracle/v1/` | +| CLI | `chain-sdk/go/cli/oracle_query.go`, `oracle_tx.go` | + +## References + +- [AEP-76: Burn Mint Equilibrium](../aep-76) +- [Cosmos SDK Bank Module](https://docs.cosmos.network/v0.46/modules/bank/) +- [CosmWasm Documentation](https://docs.cosmwasm.com/) + +## Copyright + +Copyright and related rights waived via [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-81/README.md b/src/content/aeps/aep-81/README.md new file mode 100644 index 000000000..cd7bf6bbe --- /dev/null +++ b/src/content/aeps/aep-81/README.md @@ -0,0 +1,458 @@ +--- +aep: 81 +title: "Pyth Price feed Integration" +author: Artur Troian (@troian) +status: Final +type: Standard +category: Core +created: 2026-03-06 +estimated-completion: 2026-03-23 +roadmap: major +requires: 76,80 +--- + +## Abstract + +This proposal introduces a Pyth Network oracle integration for Akash Network, providing cryptographically verifiable AKT/USD price feeds required by the Burn-Mint Equilibrium ([AEP-76](../aep-76)). +The design deploys two CosmWasm contracts — a Wormhole verification contract and a Pyth relay contract — alongside a new `x/oracle` native module that stores prices, computes TWAP, and enforces staleness/deviation health checks. +An off-chain Hermes client relays VAA-signed price data from Pyth's pull oracle onto the chain. + +## Motivation + +AEP-76 introduces BME, which settles all leases in AKT using a 30-minute TWAP oracle price. The oracle must be: + +1. **Decentralized** — no single point of failure or trust. +2. **Verifiable** — every price submission must carry a cryptographic proof. +3. **Low-latency** — sub-minute updates to keep TWAP responsive. +4. **Governance-managed** — Akash validators control guardian sets and oracle parameters without external governance dependencies. + +No existing on-chain mechanism satisfies all four. Pyth Network's pull oracle model — where prices are aggregated from first-party publishers on Pythnet, attested by 19 Wormhole Guardians, +and verified on the destination chain — meets every requirement while covering 500+ feeds across crypto, equities, FX, and commodities. + +## Specification + +### Overview + +The integration consists of four components: + +| Component | Type | Description | +|-------------------|------------------------|----------------------------------------------------------------------------| +| Wormhole contract | CosmWasm (WASM) | Verifies VAA guardian signatures | +| Pyth contract | CosmWasm (WASM) | Verifies VAA via Wormhole, parses Pyth payload, relays price to `x/oracle` | +| `x/oracle` module | Cosmos SDK native | Stores prices, calculates TWAP, enforces health checks | +| Hermes client | Off-chain (TypeScript) | Fetches VAA from Pyth Hermes API, submits to Pyth contract | + +### Architecture + +``` +┌──────────────────────────────────────────────────────────────┐ +│ Pyth Network (Off-chain) │ +│ Publishers → Pythnet → Hermes API │ +└──────────────────────────────────────────────────────────────┘ + │ + VAA with prices + │ +┌───────────────────────────────┼──────────────────────────────┐ +│ Hermes Client │ (Off-chain) │ +│ github.com/akash-network/hermes │ +│ Fetches VAA and submits to Pyth contract │ +└───────────────────────────────┼──────────────────────────────┘ + │ + execute: update_price_feed(vaa) + ▼ +┌──────────────────────────────────────────────────────────────┐ +│ Akash Network (On-chain / CosmWasm) │ +│ │ +│ ┌────────────────────────────┐ │ +│ │ Wormhole Contract │◄── WASM Contract #1 │ +│ │ - Verifies VAA signatures │ Verifies guardian │ +│ │ - Returns verified payload│ signatures (13/19) │ +│ └─────────────▲──────────────┘ │ +│ │ query: verify_vaa │ +│ │ │ +│ ┌─────────────┴──────────────┐ │ +│ │ Pyth Contract │◄── WASM Contract #2 │ +│ │ - Receives VAA from client│ Verifies + relays │ +│ │ - Queries Wormhole │ in single transaction │ +│ │ - Parses Pyth payload │ │ +│ │ - Relays to x/oracle │ │ +│ └─────────────┬──────────────┘ │ +│ │ │ +│ CosmosMsg::Custom(SubmitPrice) │ +│ ▼ │ +│ ┌────────────────────────────┐ │ +│ │ x/oracle Module │◄── Native Cosmos module │ +│ │ - Stores price │ Aggregates prices from │ +│ │ - Calculates TWAP │ authorized sources │ +│ │ - Health checks │ │ +│ └────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────┘ +``` + +### Data Flow + +1. **Pyth Publishers** aggregate prices (AKT/USD) on Pythnet. +2. **Wormhole Guardians** (19 validators) observe and sign the price attestation as a VAA. A valid VAA requires **13 of 19 signatures** (2/3 supermajority). +3. **Hermes Client** fetches the latest price + VAA from Pyth's Hermes API. +4. **Hermes Client** submits the VAA to the Pyth contract on Akash. +5. **Pyth Contract** queries the Wormhole contract to verify VAA signatures. +6. **Pyth Contract** parses the Pyth price attestation from the verified VAA payload. +7. **Pyth Contract** relays the validated price to `x/oracle` via `CosmosMsg::Custom(SubmitPrice)`. +8. **`x/oracle` Module** stores the price, calculates TWAP, and performs health checks. +9. **Consumers** (e.g., `x/bme`) query `x/oracle` for the current AKT/USD TWAP. + +### Wormhole Contract + +**Purpose:** Verify VAA signatures from Wormhole's guardian network. + +- Queries guardian addresses from `x/oracle` module params via a custom Akash querier (not stored in contract state). +- Validates that 13/19 guardians signed the VAA. +- Returns the verified payload for downstream contracts. +- Guardian set updates are managed via Akash governance, not Wormhole governance VAAs. + +**Source:** `contracts/wormhole/` + +#### Query Messages + +```rust +pub enum QueryMsg { + /// Verify VAA and return parsed contents + VerifyVAA { + vaa: Binary, // Base64-encoded VAA + block_time: u64, // Current block time for validation + }, + /// Get current guardian set info + GuardianSetInfo {}, +} +``` + +#### Instantiate Parameters + +| Parameter | Type | Description | Value | +|----------------|--------|------------------------------------------------|-------------------| +| `gov_chain` | u16 | Wormhole governance chain ID | `1` (Solana) | +| `gov_address` | Binary | Governance contract address (32 bytes, base64) | See Wormhole docs | +| `chain_id` | u16 | Wormhole chain ID for Akash | `29` | +| `fee_denom` | String | Native token denomination | `"uakt"` | + +#### Parsed VAA Response + +```rust +pub struct ParsedVAA { + pub version: u8, + pub guardian_set_index: u32, + pub timestamp: u32, + pub nonce: u32, + pub len_signers: u8, + pub emitter_chain: u16, // Source chain (26 = Pythnet) + pub emitter_address: Vec, // 32-byte emitter address + pub sequence: u64, + pub consistency_level: u8, + pub payload: Vec, // Pyth price attestation data + pub hash: Vec, +} +``` + +### Pyth Contract + +**Purpose:** Receive VAA, verify via Wormhole, parse Pyth payload, and relay price to `x/oracle`. + +- Receives raw VAA from the Hermes client. +- Queries the Wormhole contract to verify VAA signatures. +- Parses the Pyth price attestation from the verified payload. +- Validates the price feed ID and data source emitter. +- Relays the validated price to `x/oracle` (no local price storage). +- Admin-controlled for governance. + +**Source:** `contracts/pyth/` + +#### Execute Messages + +```rust +pub enum ExecuteMsg { + /// Submit price update with VAA proof + UpdatePriceFeed { + vaa: Binary, // VAA data from Pyth Hermes API (base64 encoded) + }, + /// Admin: Update the fee + UpdateFee { new_fee: Uint256 }, + /// Admin: Transfer admin rights + TransferAdmin { new_admin: String }, + /// Admin: Refresh cached oracle params + RefreshOracleParams {}, + /// Admin: Update contract configuration + UpdateConfig { + wormhole_contract: Option, + price_feed_id: Option, + data_sources: Option>, + }, +} + +pub struct DataSourceMsg { + pub emitter_chain: u16, // Wormhole chain ID (26 for Pythnet) + pub emitter_address: String, // 32 bytes, hex encoded +} +``` + +#### Query Messages + +```rust +pub enum QueryMsg { + GetConfig {}, // Returns admin, wormhole_contract, fee, feed ID, data_sources + GetPrice {}, // Returns latest price + GetPriceFeed {}, // Returns price with metadata + GetOracleParams {}, // Returns cached x/oracle params (uses custom Akash querier) +} +``` + +#### Instantiate Parameters + +| Parameter | Type | Description | Example | +|----------------------------------|--------|-----------------------------------|--------------------------| +| `admin` | String | Admin address | Governance address | +| `wormhole_contract` | String | Wormhole contract address | `akash1...` | +| `update_fee` | String | Fee for price updates (Uint256) | `"1000000"` | +| `price_feed_id` | String | Pyth price feed ID (64-char hex) | AKT/USD feed ID | +| `data_sources[].emitter_chain` | u16 | Wormhole chain ID | `26` (Pythnet) | +| `data_sources[].emitter_address` | String | Pyth emitter address (32 bytes) | See Pyth docs | + +#### Internal Flow + +1. Receive VAA from Hermes client. +2. Query Wormhole: `verify_vaa(vaa)` → `ParsedVAA`. +3. Validate emitter is a trusted Pyth data source. +4. Parse Pyth price attestation from VAA payload (P2WH batch format). +5. Validate price feed ID matches expected feed (AKT/USD). +6. Send `CosmosMsg::Custom(SubmitPrice)` to `x/oracle`. + +### Pyth Price Attestation Format + +```rust +pub struct PythPrice { + pub id: String, // Price feed ID (32 bytes, hex encoded) + pub price: i64, // Price value (scaled by 10^expo) + pub conf: u64, // Confidence interval + pub expo: i32, // Price exponent (e.g., -8 means divide by 10^8) + pub publish_time: i64, // Unix timestamp when price was published + pub ema_price: i64, // Exponential moving average price + pub ema_conf: u64, // EMA confidence interval +} +``` + +The VAA payload uses the P2WH Batch Price Attestation wire format: +- Magic bytes: `P2WH` (0x50325748) +- Major/minor version: 2 bytes each +- Header size, attestation count, attestation size: 2 bytes each +- Each attestation: 150 bytes containing price data + +### `x/oracle` Module + +The native Cosmos SDK module that serves as the single source of truth for on-chain price data. + +#### Responsibilities + +- Store price submissions from authorized sources (CosmWasm contracts). +- Calculate Time-Weighted Average Price (TWAP) over a configurable window. +- Enforce staleness checks (reject prices older than `max_price_staleness_blocks`). +- Enforce deviation bounds (reject outliers exceeding `max_price_deviation_bps`). +- Expose prices to other modules (e.g., `x/bme`) via keeper queries. + +#### Governance Parameters + +| Parameter | Type | Description | Default | +|------------------------------|----------|----------------------------------|-----------------| +| `sources` | []String | Authorized contract addresses | `[]` | +| `min_price_sources` | u32 | Minimum sources for valid price | `1` | +| `max_price_staleness_blocks` | i64 | Max age in blocks (~6s/block) | `60` (~6 min) | +| `twap_window` | i64 | TWAP calculation window (blocks) | `180` (~18 min) | +| `max_price_deviation_bps` | u64 | Max deviation in basis points | `150` (1.5%) | + +#### Feed Contract Parameters + +Oracle params include typed contract configuration stored under `feed_contracts_params`: + +```json +{ + "feed_contracts_params": [ + { + "@type": "/akash.oracle.v1.PythContractParams", + "akt_price_feed_id": "0xef0d8b6fda2ceba41da15d4095d1da392a0d2f8ed0c6c7bc0f4cfac8c280b56d" + }, + { + "@type": "/akash.oracle.v1.WormholeContractParams", + "guardian_addresses": ["58CC3AE5C097b213cE3c81979e1B9f9570746AA5", "..."] + } + ] +} +``` + +Guardian addresses are 20-byte Ethereum-style addresses (40 hex characters). The current set of 19 guardians can be obtained from [Wormhole documentation](https://wormhole.com/docs/protocol/infrastructure/guardians/). + +#### Custom Querier + +A custom Akash querier (`x/wasm/bindings/`) enables CosmWasm contracts to read `x/oracle` params (including guardian addresses) directly, without duplicating configuration in contract state. + +#### CLI Queries + +```bash +akash query oracle params # Oracle parameters +akash query oracle price uakt usd # AKT/USD price +akash query oracle prices # All prices +``` + +### Hermes Client (Price Relayer) + +An off-chain TypeScript service that bridges Pyth's pull oracle to Akash. + +**Repository:** [github.com/akash-network/hermes](https://github.com/akash-network/hermes) + +#### Why Required + +Pyth uses a pull model — prices are not automatically pushed on-chain. The Hermes client automates fetching the latest price + VAA proof from Pyth's Hermes API and submitting it to the on-chain Pyth contract. + +#### Features + +- **Daemon mode** — continuous updates at configurable intervals. +- **Smart updates** — compares `publish_time` timestamps and skips transactions when the on-chain price is already current. +- **Multi-arch Docker** — supports `linux/amd64` and `linux/arm64`. +- **CLI tools** — manual updates, price queries, admin operations. + +#### Configuration + +| Variable | Required | Default | Description | +|----------------------|----------|-------------------------------|-----------------------------| +| `RPC_ENDPOINT` | Yes | — | Akash RPC endpoint | +| `CONTRACT_ADDRESS` | Yes | — | Pyth contract address | +| `MNEMONIC` | Yes | — | Wallet mnemonic for signing | +| `HERMES_ENDPOINT` | No | `https://hermes.pyth.network` | Pyth Hermes API URL | +| `UPDATE_INTERVAL_MS` | No | `300000` | Update interval (5 min) | +| `GAS_PRICE` | No | `0.025uakt` | Gas price for transactions | +| `DENOM` | No | `uakt` | Token denomination | + +#### Cost Estimation + +| Interval | Updates/Month | Approx Monthly Cost | +|----------|---------------|---------------------| +| 5 min | 8,640 | ~9 AKT | +| 10 min | 4,320 | ~4.5 AKT | +| 15 min | 2,880 | ~3 AKT | + +Per update: ~150,000 gas × 0.025 uakt/gas = 3,750 uakt gas + 1,000,000 uakt update fee ≈ 0.001 AKT. + +### Contract Deployment + +On Akash mainnet, contract code can only be stored via governance proposals. The deployment sequence is: + +1. **Store Wormhole WASM** — governance proposal to store `wormhole.wasm`. +2. **Instantiate Wormhole** — governance proposal with init params (gov_chain, chain_id, fee_denom). +3. **Store Pyth WASM** — governance proposal to store `pyth.wasm`. +4. **Instantiate Pyth** — governance proposal with init params (admin, wormhole_contract, update_fee, price_feed_id, data_sources). +5. **Register oracle source** — governance proposal to update `x/oracle` params: add the Pyth contract to `sources`, set guardian addresses, configure TWAP/staleness/deviation parameters. +6. **Run Hermes client** — start the off-chain relayer targeting the deployed Pyth contract. + +Pre-built WASM artifacts are available at: +``` +contracts/wormhole/artifacts/wormhole.wasm +contracts/pyth/artifacts/pyth.wasm +``` + +### Guardian Set Management + +Guardian addresses are stored in `x/oracle` module params (not in the Wormhole contract). This enables: + +- **Akash governance control** — guardian set updates via standard governance proposals. +- **Faster incident response** — no dependency on Wormhole governance VAAs. +- **Single source of truth** — the Wormhole contract queries `x/oracle` params at verification time via the custom querier. + +To update the guardian set, submit a governance proposal that updates the `WormholeContractParams` in `feed_contracts_params`. + +### Source Code + +| Component | Location | +|----------------|-------------------------------------| +| Wormhole | `contracts/wormhole/` | +| Pyth | `contracts/pyth/` | +| x/oracle | `x/oracle/` | +| Custom Querier | `x/wasm/bindings/` | +| Hermes Client | `github.com/akash-network/hermes` | +| E2E Tests | `tests/e2e/pyth_contract_test.go` | + +## Rationale + +### Why Two Contracts + +Separating Wormhole verification from Pyth price parsing provides modularity. The Wormhole contract is a reusable VAA verifier that can serve future cross-chain integrations beyond Pyth. The Pyth contract handles Pyth-specific logic (P2WH parsing, feed ID validation, data source authorization) and acts as the relay to `x/oracle`. + +### Why Pull Model + +Pyth's pull oracle requires an off-chain relayer (Hermes client) rather than automatic on-chain pushes. While this introduces an operational dependency, it provides: + +- **Cost efficiency** — only pay for updates when new data is available. +- **Flexibility** — configurable update intervals to balance cost vs. freshness. +- **Simplicity** — the on-chain contracts remain stateless relays, reducing attack surface. + +### Why Guardian Addresses in `x/oracle` Params + +Storing guardian addresses in the Wormhole contract would require Wormhole governance VAAs to update them. By storing them in `x/oracle` module params: + +- Akash governance retains full control over the oracle trust model. +- Guardian set rotations are standard param-change proposals. +- No external governance dependency for security-critical updates. + +### Why Not Osmosis TWAP Alone + +AEP-76 specifies dual-feed medianization (Osmosis TWAP + external oracle). Pyth provides the external oracle feed. Using only Osmosis TWAP would create a single point of failure and be vulnerable to manipulation of a single liquidity pool. + +## Security Considerations + +### VAA Verification + +Every price submission must carry a valid VAA signed by at least 13 of 19 Wormhole Guardians. The Wormhole contract verifies all signatures on-chain before any price data is accepted. Without valid VAA verification, price submissions are rejected. + +### Oracle Health Checks + +The `x/oracle` module enforces: +- **Staleness** — prices older than `max_price_staleness_blocks` are rejected. +- **Deviation** — prices deviating more than `max_price_deviation_bps` from the current TWAP are rejected as outliers. +- **Minimum sources** — `min_price_sources` must be met for a valid price. + +### Authorized Sources + +Only contract addresses listed in `x/oracle` params `sources` can submit prices. Adding or removing sources requires a governance proposal. + +### Guardian Set Trust + +The 19 Wormhole Guardians (including Google Cloud and other major validators) provide decentralized trust. The 13/19 quorum threshold means an attacker must compromise a supermajority of guardians to forge a VAA. + +### Hermes Client Wallet + +The relayer wallet should be a dedicated, minimally funded account. It only needs enough AKT to cover gas and update fees. Compromise of this wallet cannot produce fake prices — it can only submit VAAs that must still pass on-chain verification. + +### Contract Governance + +Both contracts are deployed with an admin address (governance). Contract upgrades, configuration changes, and fee updates all require governance authorization. + +## Backward Compatibility + +This proposal introduces new on-chain modules and contracts. It does not modify existing modules or break existing functionality. The `x/oracle` module is a new addition required by AEP-76. + +## Implementations + +- Wormhole contract: `contracts/wormhole/` +- Pyth contract: `contracts/pyth/` +- Hermes client: [github.com/akash-network/hermes](https://github.com/akash-network/hermes) + +## References + +- [Pyth Network Documentation](https://docs.pyth.network/) +- [Pyth Hermes API](https://hermes.pyth.network/docs/) +- [Pyth Price Feed IDs](https://pyth.network/developers/price-feed-ids) +- [Wormhole Documentation](https://docs.wormhole.com/) +- [Wormhole Guardians](https://wormhole.com/docs/protocol/infrastructure/guardians/) +- [CosmWasm Documentation](https://docs.cosmwasm.com/) +- [AEP-76: Burn Mint Equilibrium](../aep-76) +- [AEP-81: Pyth Oracle Integration](../aep-81) + +## Copyright + +Copyright and related rights waived via [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-82/IMPLEMENTATION.md b/src/content/aeps/aep-82/IMPLEMENTATION.md new file mode 100644 index 000000000..68bf237f2 --- /dev/null +++ b/src/content/aeps/aep-82/IMPLEMENTATION.md @@ -0,0 +1,746 @@ +# AEP-82 Implementation Guide + +> Companion to [AEP-82: Resource Reclamation](./README.md) (the authoritative specification). +> +> This document contains **implementation-specific details only**: protobuf definitions, Go interfaces, +> codebase-specific notes, handler pseudocode, and the complete file change matrix. For the design, +> behavior, rules, and rationale, see [README.md](./README.md). + +--- + +## 1. Codebase-Specific Implementation Notes + +### 1.1 CreateBid Keeper Signature + +The current `CreateBid` keeper method accepts `(id, price, resourcesOffer)` and constructs the `Bid` +struct inline. The new `reclamation_window` field on `Bid` requires updating this signature: + +```go +// Current +CreateBid(ctx sdk.Context, id mv1.BidID, price sdk.DecCoin, roffer types.ResourcesOffer) (types.Bid, error) + +// Updated +CreateBid(ctx sdk.Context, id mv1.BidID, price sdk.DecCoin, roffer types.ResourcesOffer, reclaimWindow *time.Duration) (types.Bid, error) +``` + +This change propagates to the `IKeeper` interface and all call sites. + +### 1.2 CreateOrder Signature + +The current `CreateOrder` accepts `(groupID, groupSpec)`. The reclamation requirement (deployment-level) +must be passed as an additional parameter: + +```go +// Current +CreateOrder(ctx sdk.Context, gid dtypes.GroupID, spec dvbeta.GroupSpec) (types.Order, error) + +// Updated +CreateOrder(ctx sdk.Context, gid dtypes.GroupID, spec dvbeta.GroupSpec, reclamation *mv1.DeploymentReclamation) (types.Order, error) +``` + +This change propagates to: +- `IKeeper` interface (`x/market/keeper/keeper.go:26`) +- `MarketKeeper` interface (`x/deployment/imports/keepers.go:22`) +- Concrete implementation (`x/market/keeper/keeper.go:153`) +- All call sites (see [Section 5](#5-complete-createorder-call-site-inventory)) + +### 1.3 Market Handler DeploymentKeeper Interface + +The `CloseLease` handler in `x/market/handler/server.go` auto-creates a new order when the tenant +closes a lease and the group remains open (line 289). It needs to pass the reclamation requirement to +`CreateOrder`. The order's `Reclamation` field (already fetched at line 246) provides this: + +```go +ms.keepers.Market.CreateOrder(ctx, group.ID, group.GroupSpec, order.Reclamation) +``` + +No change to the `DeploymentKeeper` interface in `x/market/handler/keepers.go` is required for +this path – the reclamation config comes from the `Order`, not the `Deployment`. + +However, `StartGroup` in the deployment handler needs the `Deployment` to retrieve the reclamation +config. The deployment handler already has access to `ms.deployment` (`keeper.IKeeper`), which +exposes `GetDeployment`. No new interface method is needed. + +### 1.4 Query Handler: Hardcoded Lease States + +The lease query handler at `x/market/keeper/grpc_query.go:761` hardcodes the "all states" list: + +```go +states = append(states, byte(v1.LeaseActive), byte(v1.LeaseInsufficientFunds), byte(v1.LeaseClosed)) +``` + +`LeaseReclaiming` must be added: + +```go +states = append(states, byte(v1.LeaseActive), byte(v1.LeaseInsufficientFunds), byte(v1.LeaseClosed), byte(v1.LeaseReclaiming)) +``` + +Without this change, queries with no state filter silently omit all reclaiming leases. + +### 1.5 Escrow Hooks: Intentional Reclamation Bypass + +The `OnEscrowPaymentClosed` hook in `x/market/hooks/hooks.go` does not check lease state before +closing. When an escrow payment closes (insufficient funds or deployment closure), a lease in +`LeaseReclaiming` is closed immediately, bypassing the reclamation window. + +This is **intentional and correct**: +- Insufficient funds: the money ran out; reclamation cannot continue. +- Deployment closure: the tenant voluntarily exited; reclamation is moot. + +No changes to hooks are required. + +### 1.6 OnLeaseClosed Guard + +The existing `OnLeaseClosed` guard in `x/market/keeper/keeper.go:347-350` skips leases that are +already `LeaseClosed` or `LeaseInsufficientFunds`. The new `LeaseReclaiming` state is NOT in the +skip list, so `OnLeaseClosed` correctly handles closing a reclaiming lease. No change needed. + +### 1.7 Lease State Index + +The lease state index (`LeaseIndexes.State`) uses `int32(lease.State)` as the index key. The new +`LeaseReclaiming = 4` state is automatically indexed without code changes. Queries filtering by +`state=4` work out of the box. + +### 1.8 WithdrawLease During Reclamation + +The `WithdrawLease` handler (`server.go:165-178`) only checks that the lease exists, then calls +`Escrow.PaymentWithdraw`. It does **not** check `lease.State`. This means `WithdrawLease` works +correctly during `LeaseReclaiming` without any code changes. The provider can withdraw accrued +funds at any point during the reclamation window. + +### 1.9 CloseLease Error Code + +The existing `CloseLease` handler (line 267-268) returns `mv1.ErrOrderClosed` when the lease is +not active. This is a pre-existing misnomer (should be `ErrLeaseNotActive`). This AEP does not +fix it but notes it for future cleanup. The reclamation changes update the check to also accept +`LeaseReclaiming`: + +```go +if lease.State != mv1.LeaseActive && lease.State != mv1.LeaseReclaiming { + return &mvbeta.MsgCloseLeaseResponse{}, mv1.ErrOrderClosed +} +``` + +--- + +## 2. Protobuf Definitions + +### 2.1 New File: `proto/node/akash/market/v1/reclamation.proto` + +```protobuf +syntax = "proto3"; +package akash.market.v1; + +import "gogoproto/gogo.proto"; +import "google/protobuf/duration.proto"; +import "akash/market/v1/types.proto"; + +option go_package = "pkg.akt.dev/go/node/market/v1"; + +// DeploymentReclamation defines the tenant's reclamation requirements. +// Stored on the Deployment and propagated to Orders. +message DeploymentReclamation { + // min_window is the minimum reclamation window the tenant requires. + google.protobuf.Duration min_window = 1 [ + (gogoproto.nullable) = false, + (gogoproto.stdduration) = true, + (gogoproto.jsontag) = "min_window", + (gogoproto.moretags) = "yaml:\"min_window\"" + ]; +} + +// Reclamation defines the runtime reclamation state stored on a Lease. +message Reclamation { + // window is the negotiated reclamation window duration (from the winning bid). + google.protobuf.Duration window = 1 [ + (gogoproto.nullable) = false, + (gogoproto.stdduration) = true, + (gogoproto.jsontag) = "window", + (gogoproto.moretags) = "yaml:\"window\"" + ]; + + // started_at is the block height at which reclamation was initiated. + // Zero means reclamation has not been started yet. + int64 started_at = 2 [ + (gogoproto.jsontag) = "started_at", + (gogoproto.moretags) = "yaml:\"started_at\"" + ]; + + // deadline is the unix timestamp at which the reclamation window expires. + // Zero means reclamation has not been started yet. + int64 deadline = 3 [ + (gogoproto.jsontag) = "deadline", + (gogoproto.moretags) = "yaml:\"deadline\"" + ]; + + // reason is the provider's stated reason for reclamation. + LeaseClosedReason reason = 4 [ + (gogoproto.jsontag) = "reason", + (gogoproto.moretags) = "yaml:\"reason\"" + ]; +} +``` + +### 2.2 Modify: `proto/node/akash/market/v1/lease.proto` + +Add to `Lease.State` enum: + +```protobuf +// LeaseReclaiming denotes a lease in reclamation (grace period before closure). +reclaiming = 4 [ + (gogoproto.enumvalue_customname) = "LeaseReclaiming" +]; +``` + +Add field to `Lease` message: + +```protobuf +// Reclamation holds reclamation configuration and state, if applicable. +// Nil if reclamation is not configured for this lease. +Reclamation reclamation = 7 [ + (gogoproto.nullable) = true, + (gogoproto.jsontag) = "reclamation,omitempty", + (gogoproto.moretags) = "yaml:\"reclamation,omitempty\"" +]; +``` + +### 2.3 Modify: `proto/node/akash/market/v1/event.proto` + +Add: + +```protobuf +// EventLeaseReclaimStarted is triggered when a provider initiates reclamation. +message EventLeaseReclaimStarted { + option (gogoproto.equal) = true; + + LeaseID id = 1 [ + (gogoproto.nullable) = false, + (gogoproto.customname) = "ID", + (gogoproto.jsontag) = "id", + (gogoproto.moretags) = "yaml:\"id\"" + ]; + + LeaseClosedReason reason = 2 [ + (gogoproto.jsontag) = "reason", + (gogoproto.moretags) = "yaml:\"reason\"" + ]; + + // deadline is the unix timestamp when the reclamation window expires. + int64 deadline = 3 [ + (gogoproto.jsontag) = "deadline", + (gogoproto.moretags) = "yaml:\"deadline\"" + ]; +} +``` + +### 2.4 Modify: `proto/node/akash/market/v1beta5/leasemsg.proto` + +Add: + +```protobuf +// MsgLeaseStartReclaim is sent by the provider to initiate reclamation on an active lease. +message MsgLeaseStartReclaim { + option (gogoproto.equal) = false; + + akash.market.v1.LeaseID id = 1 [ + (gogoproto.customname) = "ID", + (gogoproto.nullable) = false, + (gogoproto.jsontag) = "id", + (gogoproto.moretags) = "yaml:\"id\"" + ]; + + akash.market.v1.LeaseClosedReason reason = 2 [ + (gogoproto.jsontag) = "reason", + (gogoproto.moretags) = "yaml:\"reason\"" + ]; +} + +// MsgLeaseStartReclaimResponse is the response from starting lease reclamation. +message MsgLeaseStartReclaimResponse {} +``` + +### 2.5 Modify: `proto/node/akash/market/v1beta5/service.proto` + +Add RPC to `service Msg`: + +```protobuf +// LeaseStartReclaim initiates the reclamation window on an active lease. +rpc LeaseStartReclaim(MsgLeaseStartReclaim) returns (MsgLeaseStartReclaimResponse); +``` + +### 2.6 Modify: `proto/node/akash/market/v1beta5/bidmsg.proto` + +Add to `MsgCreateBid`: + +```protobuf +// reclamation_window is the reclamation window duration the provider offers. +// If the order requires reclamation, this must be >= the order's min_window. +// Nil means the provider does not offer reclamation on this bid. +google.protobuf.Duration reclamation_window = 5 [ + (gogoproto.nullable) = true, + (gogoproto.stdduration) = true, + (gogoproto.jsontag) = "reclamation_window,omitempty", + (gogoproto.moretags) = "yaml:\"reclamation_window,omitempty\"" +]; +``` + +### 2.7 Modify: `proto/node/akash/market/v1beta5/bid.proto` + +Add to `Bid`: + +```protobuf +// reclamation_window is the reclamation window offered by this provider. +google.protobuf.Duration reclamation_window = 6 [ + (gogoproto.nullable) = true, + (gogoproto.stdduration) = true, + (gogoproto.jsontag) = "reclamation_window,omitempty", + (gogoproto.moretags) = "yaml:\"reclamation_window,omitempty\"" +]; +``` + +### 2.8 Modify: `proto/node/akash/market/v1beta5/order.proto` + +Add to `Order`: + +```protobuf +// reclamation is the deployment-level reclamation requirement, propagated to the order. +// Nil means the deployment does not require reclamation. +akash.market.v1.DeploymentReclamation reclamation = 5 [ + (gogoproto.nullable) = true, + (gogoproto.jsontag) = "reclamation,omitempty", + (gogoproto.moretags) = "yaml:\"reclamation,omitempty\"" +]; +``` + +### 2.9 Modify: `proto/node/akash/market/v1beta5/params.proto` + +Add: + +```protobuf +// min_reclamation_window is the minimum reclamation window duration allowed. +google.protobuf.Duration min_reclamation_window = 4 [ + (gogoproto.nullable) = false, + (gogoproto.stdduration) = true, + (gogoproto.jsontag) = "min_reclamation_window", + (gogoproto.moretags) = "yaml:\"min_reclamation_window\"" +]; + +// max_reclamation_window is the maximum reclamation window duration allowed. +google.protobuf.Duration max_reclamation_window = 5 [ + (gogoproto.nullable) = false, + (gogoproto.stdduration) = true, + (gogoproto.jsontag) = "max_reclamation_window", + (gogoproto.moretags) = "yaml:\"max_reclamation_window\"" +]; +``` + +### 2.10 Modify: `proto/node/akash/deployment/v1beta4/deploymentmsg.proto` + +Add to `MsgCreateDeployment`: + +```protobuf +// reclamation specifies the deployment-level reclamation requirements. +// Nil means the tenant does not require reclamation. +akash.market.v1.DeploymentReclamation reclamation = 5 [ + (gogoproto.nullable) = true, + (gogoproto.jsontag) = "reclamation,omitempty", + (gogoproto.moretags) = "yaml:\"reclamation,omitempty\"" +]; +``` + +### 2.11 Modify: `proto/node/akash/deployment/v1/deployment.proto` + +Add to `Deployment`: + +```protobuf +// reclamation stores the deployment's reclamation requirements for persistence. +// Needed so that StartGroup can propagate reclamation to newly created orders. +akash.market.v1.DeploymentReclamation reclamation = 5 [ + (gogoproto.nullable) = true, + (gogoproto.jsontag) = "reclamation,omitempty", + (gogoproto.moretags) = "yaml:\"reclamation,omitempty\"" +]; +``` + +--- + +## 3. Go Interface Changes + +### 3.1 Error Sentinels (`go/node/market/v1/errors.go`) + +```go +ErrLeaseNotReclamable = sdkerrors.Register(ModuleName, N, "lease does not have reclamation configured") +ErrLeaseAlreadyReclaiming = sdkerrors.Register(ModuleName, N, "reclamation already in progress") +ErrReclamationNotStarted = sdkerrors.Register(ModuleName, N, "reclamation not started; call MsgLeaseStartReclaim first") +ErrReclamationWindowNotElapsed = sdkerrors.Register(ModuleName, N, "reclamation window has not elapsed") +ErrReclamationWindowInvalid = sdkerrors.Register(ModuleName, N, "reclamation window outside governance bounds") +ErrReclamationRequired = sdkerrors.Register(ModuleName, N, "order requires reclamation but bid does not offer it") +ErrReclamationWindowTooShort = sdkerrors.Register(ModuleName, N, "reclamation window shorter than order minimum") +``` + +Note: `ErrReclamationNotStarted` and `ErrReclamationWindowNotElapsed` are intentionally split into two +distinct errors. The former is returned when the provider tries to `CloseBid` on an `Active` lease that +has reclamation. The latter is returned when the provider tries to `CloseBid` on a `Reclaiming` lease +before the deadline. + +#### Deployment Module Error (`go/node/deployment/v1/errors.go`) + +```go +ErrInvalidReclamation = sdkerrors.Register(ModuleName, N, "invalid reclamation configuration") +``` + +This error is returned by the `CreateDeployment` handler when the tenant's `min_window` fails +validation against governance bounds (see [Section 4.6](#46-modified-createdeployment-xdeploymenthandlerservergo)). + +### 3.2 Message Support (`go/node/market/v1beta5/msgs.go`) + +```go +func NewMsgLeaseStartReclaim(id v1.LeaseID, reason v1.LeaseClosedReason) *MsgLeaseStartReclaim + +func (msg *MsgLeaseStartReclaim) Type() string // "MsgLeaseStartReclaim" +func (msg *MsgLeaseStartReclaim) Route() string // v1.RouterKey +func (msg *MsgLeaseStartReclaim) GetSignBytes() []byte // standard marshal + +func (msg *MsgLeaseStartReclaim) GetSigners() []sdk.AccAddress { + // signer is the provider + provider, _ := sdk.AccAddressFromBech32(msg.ID.Provider) + return []sdk.AccAddress{provider} +} + +func (msg *MsgLeaseStartReclaim) ValidateBasic() error { + if err := msg.ID.Validate(); err != nil { + return err + } + if !msg.Reason.IsRange(v1.LeaseClosedReasonRangeProvider) { + return cerrors.Wrapf(v1.ErrInvalidLeaseClosedReason, "value \"%d\" range 10000..19999", msg.Reason) + } + return nil +} +``` + +### 3.3 Codec Registration (`go/node/market/v1beta5/codec.go`) + +In `init()`: +```go +sdkutil.RegisterCustomSignerField(&MsgLeaseStartReclaim{}, "id", "provider") +``` + +In `RegisterInterfaces`: +```go +registry.RegisterImplementations((*sdk.Msg)(nil), + // ... existing ... + &MsgLeaseStartReclaim{}, +) +``` + +In `RegisterLegacyAminoCodec`: +```go +cdc.RegisterConcrete(&MsgLeaseStartReclaim{}, "akash-sdk/x/"+v1.ModuleName+"/"+(&MsgLeaseStartReclaim{}).Type(), nil) +``` + +### 3.4 Params (`go/node/market/v1beta5/params.go`) + +```go +const ( + DefaultMinReclamationWindow = 1 * time.Hour + DefaultMaxReclamationWindow = 720 * time.Hour // 30 days +) +``` + +Update `DefaultParams()` and `Validate()` accordingly. + +--- + +## 4. Handler Pseudocode + +### 4.1 LeaseStartReclaim (`x/market/handler/server.go`) + +```go +func (ms msgServer) LeaseStartReclaim(goCtx context.Context, msg *mvbeta.MsgLeaseStartReclaim) (*mvbeta.MsgLeaseStartReclaimResponse, error) { + ctx := sdk.UnwrapSDKContext(goCtx) + + lease, found := ms.keepers.Market.GetLease(ctx, msg.ID) + if !found { + return nil, mv1.ErrUnknownLease + } + + if lease.State != mv1.LeaseActive { + return nil, mv1.ErrLeaseNotActive + } + + if lease.Reclamation == nil { + return nil, mv1.ErrLeaseNotReclamable + } + + if lease.Reclamation.StartedAt != 0 { + return nil, mv1.ErrLeaseAlreadyReclaiming + } + + blockTime := ctx.BlockTime() + deadline := blockTime.Add(lease.Reclamation.Window) + + lease.Reclamation.StartedAt = ctx.BlockHeight() + lease.Reclamation.Deadline = deadline.Unix() + lease.Reclamation.Reason = msg.Reason + lease.State = mv1.LeaseReclaiming + + if err := ms.keepers.Market.SaveLease(ctx, lease); err != nil { + return nil, err + } + + if err := ctx.EventManager().EmitTypedEvent(&mv1.EventLeaseReclaimStarted{ + ID: lease.ID, + Reason: msg.Reason, + Deadline: deadline.Unix(), + }); err != nil { + return nil, err + } + + return &mvbeta.MsgLeaseStartReclaimResponse{}, nil +} +``` + +### 4.2 Modified CloseBid (`x/market/handler/server.go`) + +The active-bid-with-lease path (currently lines 137-162) gains a reclamation check. +The existing lease state check at line 142 (`if lease.State != mv1.LeaseActive`) must be +**replaced** by the following switch block (not added after it): + +```go +// Replaces the existing `if lease.State != mv1.LeaseActive` check at line 142. +// Must appear BEFORE the bid.State check at line 146. + +switch lease.State { +case mv1.LeaseActive: + if lease.Reclamation != nil { + return nil, mv1.ErrReclamationNotStarted + } + // No reclamation -- proceed with existing close cascade +case mv1.LeaseReclaiming: + if ctx.BlockTime().Unix() < lease.Reclamation.Deadline { + return nil, mv1.ErrReclamationWindowNotElapsed + } + // Window elapsed -- proceed with existing close cascade +default: + return nil, mv1.ErrLeaseNotActive +} + +// ... existing close cascade (Deployment.OnBidClosed, OnLeaseClosed, etc.) ... +``` + +### 4.3 Modified CreateBid (`x/market/handler/server.go`) + +After existing attribute/capability matching (around line 97), add reclamation validation: + +```go +// Reclamation validation +if order.Reclamation != nil { + if msg.ReclamationWindow == nil { + return nil, mv1.ErrReclamationRequired + } + if *msg.ReclamationWindow < order.Reclamation.MinWindow { + return nil, mv1.ErrReclamationWindowTooShort + } +} + +if msg.ReclamationWindow != nil { + if *msg.ReclamationWindow < params.MinReclamationWindow { + return nil, mv1.ErrReclamationWindowInvalid + } + if *msg.ReclamationWindow > params.MaxReclamationWindow { + return nil, mv1.ErrReclamationWindowInvalid + } +} + +// Pass reclamation_window to CreateBid +bid, err := ms.keepers.Market.CreateBid(ctx, msg.ID, msg.Price, msg.ResourcesOffer, msg.ReclamationWindow) +``` + +### 4.4 Modified CreateLease (`x/market/handler/server.go`) + +After `Market.CreateLease(ctx, bid)` succeeds (line 222), store reclamation config: + +```go +if bid.ReclamationWindow != nil { + lease, _ := ms.keepers.Market.GetLease(ctx, bid.ID.LeaseID()) + lease.Reclamation = &mv1.Reclamation{ + Window: *bid.ReclamationWindow, + } + ms.keepers.Market.SaveLease(ctx, lease) +} +``` + +Note: This performs a create followed by an immediate get-modify-save, resulting in two store writes +for the lease. An optimization would be to modify the `CreateLease` keeper method to accept an +optional `*mv1.Reclamation` parameter and set it during initial creation. Either approach is correct; +the two-write approach is shown here for clarity. + +### 4.5 Modified CloseLease (`x/market/handler/server.go`) + +Update the lease state check (line 267) to accept `LeaseReclaiming`: + +```go +if lease.State != mv1.LeaseActive && lease.State != mv1.LeaseReclaiming { + return &mvbeta.MsgCloseLeaseResponse{}, mv1.ErrOrderClosed +} +``` + +Pass reclamation to the re-order call (line 289): + +```go +if _, err := ms.keepers.Market.CreateOrder(ctx, group.ID, group.GroupSpec, order.Reclamation); err != nil { + return &mvbeta.MsgCloseLeaseResponse{}, err +} +``` + +### 4.6 Modified CreateDeployment (`x/deployment/handler/server.go`) + +Validate reclamation against governance bounds, store on the Deployment, and pass to CreateOrder: + +```go +// Validate reclamation window against market module params +if msg.Reclamation != nil { + marketParams, err := ms.market.GetParams(ctx) // requires adding GetParams to the MarketKeeper interface + if err != nil { + return nil, err + } + if msg.Reclamation.MinWindow <= 0 { + return nil, v1.ErrInvalidReclamation + } + if msg.Reclamation.MinWindow < marketParams.MinReclamationWindow { + return nil, v1.ErrInvalidReclamation + } + if msg.Reclamation.MinWindow > marketParams.MaxReclamationWindow { + return nil, v1.ErrInvalidReclamation + } +} + +deployment := v1.Deployment{ + ID: did, + State: v1.DeploymentActive, + Hash: msg.Hash, + CreatedAt: ctx.BlockHeight(), + Reclamation: msg.Reclamation, // NEW +} + +// In the order creation loop: +for _, group := range groups { + if _, err := ms.market.CreateOrder(ctx, group.ID, group.GroupSpec, msg.Reclamation); err != nil { + return &types.MsgCreateDeploymentResponse{}, err + } +} +``` + +Note: This requires adding `GetParams` to the `MarketKeeper` interface used by the deployment handler +(`x/deployment/imports/keepers.go`). Alternatively, the reclamation bounds could be stored as deployment +module parameters, but keeping them in the market module avoids parameter duplication. + +### 4.7 Modified StartGroup (`x/deployment/handler/server.go`) + +Retrieve reclamation from the persisted Deployment: + +```go +deployment, found := ms.deployment.GetDeployment(ctx, msg.ID.DeploymentID()) +if !found { + return &types.MsgStartGroupResponse{}, v1.ErrDeploymentNotFound +} + +if _, err := ms.market.CreateOrder(ctx, group.ID, group.GroupSpec, deployment.Reclamation); err != nil { + return &types.MsgStartGroupResponse{}, err +} +``` + +--- + +## 5. Complete CreateOrder Call Site Inventory + +All locations that call `CreateOrder` and must be updated to pass the `reclamation` parameter: + +### Interface Definitions (2) + +| File | Line | Interface | +|----------------------------------------|------|----------------| +| `node/x/market/keeper/keeper.go` | 26 | `IKeeper` | +| `node/x/deployment/imports/keepers.go` | 22 | `MarketKeeper` | + +### Implementation (1) + +| File | Line | +|----------------------------------|------| +| `node/x/market/keeper/keeper.go` | 153 | + +### Production Call Sites (3) + +| File | Line | Context | +|---------------------------------------|------|----------------------------| +| `node/x/deployment/handler/server.go` | 101 | `CreateDeployment` | +| `node/x/deployment/handler/server.go` | 226 | `StartGroup` | +| `node/x/market/handler/server.go` | 289 | `CloseLease` auto-re-order | + +### Test Call Sites (3) + +| File | Line | Context | +|-----------------------------------------|------|-------------------------| +| `node/x/market/keeper/keeper_test.go` | 27 | `Test_CreateOrder` | +| `node/x/market/keeper/keeper_test.go` | 503 | `createOrder` helper | +| `node/x/market/handler/handler_test.go` | 1414 | `testSuite.createOrder` | + +--- + +## 6. Upgrade Handler + +The upgrade handler for this chain version must: + +1. **Set default reclamation params** -- existing chains have zero-value `Duration` for + `min_reclamation_window` and `max_reclamation_window`. These must be set to defaults before + any reclamation transactions can be validated: + +```go +params, _ := marketKeeper.GetParams(ctx) +params.MinReclamationWindow = 1 * time.Hour +params.MaxReclamationWindow = 720 * time.Hour +marketKeeper.SetParams(ctx, params) +``` + +2. **No data migration** -- existing leases have `reclamation = nil` (new nullable proto field). + Existing orders and bids have `reclamation = nil` / `reclamation_window = nil`. No state + migration is needed. + +--- + +## 7. Testing Strategy + +### Unit Tests (keeper_test.go) + +- Create order with reclamation requirement; verify stored on order +- Create bid with `reclamation_window`; verify stored on bid +- Create lease from bid with reclamation; verify `Reclamation` stored on lease +- `MsgLeaseStartReclaim` on lease with reclamation; verify state transition to `LeaseReclaiming` +- `MsgLeaseStartReclaim` on lease without reclamation; verify `ErrLeaseNotReclamable` +- `MsgLeaseStartReclaim` on already-reclaiming lease; verify `ErrLeaseAlreadyReclaiming` +- `OnLeaseClosed` from `LeaseReclaiming` state; verify correct transition to `LeaseClosed` +- `OnGroupClosed` cascade with reclaiming lease; verify lease closed + +### Handler Integration Tests (handler_test.go) + +- Full flow: deploy with reclamation -> bid with window -> lease -> start reclaim -> advance time -> close +- `CreateBid` without reclamation on order that requires it -> `ErrReclamationRequired` +- `CreateBid` with window shorter than order minimum -> `ErrReclamationWindowTooShort` +- `CreateBid` with window outside governance bounds -> `ErrReclamationWindowInvalid` +- `CloseBid` on `LeaseActive` with reclamation -> `ErrReclamationNotStarted` +- `CloseBid` on `LeaseReclaiming` before deadline -> `ErrReclamationWindowNotElapsed` +- `CloseBid` on `LeaseReclaiming` after deadline -> success; group paused +- `CloseLease` (tenant) during reclamation window -> success; auto-re-order with reclamation inherited +- `CloseLease` (tenant) on `LeaseActive` with reclamation -> success (tenant always can close) +- Provider offers reclamation voluntarily (order does not require it) -> accepted; lease has reclamation +- Lease without reclamation -> existing behavior unchanged +- Escrow insufficient funds during reclamation -> lease closed, bypasses window +- `CreateDeployment` with `min_window < params.MinReclamationWindow` -> `ErrInvalidReclamation` +- `CreateDeployment` with `min_window > params.MaxReclamationWindow` -> `ErrInvalidReclamation` +- `CreateDeployment` with `min_window = 0` -> `ErrInvalidReclamation` +- `StartGroup` after reclamation close -> new order inherits reclamation from `Deployment` + +### Genesis Tests + +- Export/import round-trip with leases in `LeaseReclaiming` state +- Export/import with reclamation-configured orders and bids diff --git a/src/content/aeps/aep-82/README.md b/src/content/aeps/aep-82/README.md new file mode 100644 index 000000000..2742df7e8 --- /dev/null +++ b/src/content/aeps/aep-82/README.md @@ -0,0 +1,347 @@ +--- +aep: 82 +title: "Resource Reclamation" +author: Artur Troian (@troian) +status: Last Call +type: Standard +category: Core +estimated-completion: 2026-05-31 +created: 2026-04-22 +roadmap: major +--- + +## Motivation + +Akash providers currently have two options when they need to terminate a lease: close it immediately (via +`MsgCloseBid`) or wait for the tenant to close it. Immediate closure is disruptive – the tenant's workloads +are terminated without warning, potentially causing data loss and downtime. There is no on-chain mechanism for +providers to give tenants advance notice of an upcoming termination. + +This is a problem for several real-world provider scenarios: + +- **Planned maintenance** – hardware upgrades, firmware updates, or data center moves require workload evacuation. +- **Decommissioning** – a provider retiring capacity needs to wind down active leases gracefully. +- **Resource rebalancing** – a provider may need to reclaim specific resources while continuing to serve others. + +Without a grace period mechanism, providers either accept the reputational cost of abrupt termination or +indefinitely delay necessary infrastructure changes. Tenants, in turn, cannot distinguish between a provider +that will give them time to migrate and one that will cut them off without warning. + +## Summary + +This AEP introduces **Resource Reclamation**, a marketplace extension that provides a negotiated grace period +before provider-initiated lease termination. + +1. **Tenant opt-in** - A tenant may specify a minimum reclamation window as a requirement in + `MsgCreateDeployment`. Providers that do not support reclamation must not bid on such deployments. + +2. **Provider opt-in** - A provider may offer a reclamation window in their bid, even for deployments that + do not require it. If the deployment requires reclamation, the provider's offered window must meet or + exceed the tenant's minimum. + +3. **Negotiated window** – The reclamation window is a wall-clock duration negotiated between tenant and + provider at bid time and stored on the lease when it is created. + +4. **Reclamation initiation** – A provider initiates reclamation by sending `MsgLeaseStartReclaim`, which + transitions the lease from `Active` to `Reclaiming` and sets a deadline. + +5. **Window enforcement** – During the reclamation window, the provider cannot close the lease. The tenant + can close at any time. Payments continue at the full lease rate. After the window elapses, either party + may close the lease. + +6. **Governance bounds** – Module parameters enforce minimum and maximum allowed reclamation window durations. + +## Specification + +This specification targets the current active proto versions: `market/v1beta5` for marketplace +messages and `deployment/v1beta4` for deployment messages. The `deployment/v1beta5` proto (not yet +active in the node) is not addressed and can be updated separately when it is adopted. + +### Reclamation Configuration + +#### Tenant Requirements + +A tenant specifies reclamation requirements at the deployment level via a new field on `MsgCreateDeployment`: + +``` +DeploymentReclamation { + min_window: Duration // minimum reclamation window the tenant requires +} +``` + +This is a deployment-wide setting. All groups within the deployment share the same reclamation requirement. +The configuration is persisted on the `Deployment` on-chain record so that it survives group restarts +(`MsgStartGroup` must propagate the reclamation requirement to newly created orders). + +When set, the requirement is propagated to every `Order` created for the deployment's groups. + +When not set (nil), the deployment does not require reclamation. Providers may still voluntarily offer it. + +#### Provider Offers + +A provider includes a reclamation window in their bid via a new field on `MsgCreateBid`: + +``` +reclamation_window: Duration // optional; nil means provider does not offer reclamation +``` + +The offered window is stored on the `Bid` and, upon lease creation, on the `Lease`. + +#### Matching Rules + +During bid validation (`CreateBid` handler): + +1. If the order requires reclamation (`order.Reclamation != nil`) and the bid does not offer it + (`msg.ReclamationWindow == nil`), the bid is rejected with `ErrReclamationRequired`. + +2. If the order requires reclamation and the bid's window is shorter than the order's minimum + (`msg.ReclamationWindow < order.Reclamation.MinWindow`), the bid is rejected with + `ErrReclamationWindowTooShort`. + +3. If the bid offers reclamation, the window must be within governance bounds + (`params.MinReclamationWindow <= window <= params.MaxReclamationWindow`), regardless of whether the + order requires it. + +4. If the order does not require reclamation and the bid does not offer it, no reclamation checks apply + (existing behavior, unchanged). + +#### Deployment Validation + +When `MsgCreateDeployment` includes a `reclamation` field, the `min_window` is validated: + +- `min_window` must be > 0 +- `min_window` must be >= `params.MinReclamationWindow` +- `min_window` must be <= `params.MaxReclamationWindow` + +This validation occurs in the `CreateDeployment` handler (not `ValidateBasic`, since governance +parameters are not available during basic validation). If the `min_window` fails validation, the +deployment creation is rejected. + +#### Lease Creation + +When `MsgCreateLease` is processed and the winning bid offers a reclamation window: + +- The lease is created with a `Reclamation` record containing the negotiated window duration. +- The `Reclamation.StartedAt`, `Reclamation.Deadline`, and `Reclamation.Reason` fields are zero/empty + until the provider initiates reclamation. + +When the winning bid does not offer a reclamation window, the lease has no `Reclamation` record (nil), +and all existing lease lifecycle behavior is unchanged. + +### Lease State Machine + +A new lease state `LeaseReclaiming` (value `4`) is added to the `Lease.State` enum: + +``` +LeaseStateInvalid = 0 +LeaseActive = 1 +LeaseInsufficientFunds = 2 +LeaseClosed = 3 +LeaseReclaiming = 4 // NEW +``` + +#### State Transitions + +``` + MsgCreateLease + | + v + +-----------+ + | Active(1) | + +-----+-----+ + | + +------------------+------------------------+ + | | | + MsgLeaseStartReclaim MsgCloseLease Escrow cascade / + (provider only) (tenant only) MsgCloseDeployment / + | | InsufficientFunds + v v | + +--------------+ +----------+ +--------------------+ + |Reclaiming(4) | | Closed(3)| |InsufficientFunds(2)| + +------+-------+ +----------+ +--------------------+ + | + +----+------------------+-------------------+ + | | | + MsgCloseLease MsgCloseBid Escrow cascade / + (tenant, anytime) (provider, MsgCloseDeployment / + after window) InsufficientFunds + | | | + v v v + +----------+ +----------+ +--------------------+ + | Closed(3)| | Closed(3)| |InsufficientFunds(2)| + +----------+ +----------+ +--------------------+ +``` + +### MsgLeaseStartReclaim + +A new transaction message sent by the provider to initiate reclamation on an active lease. + +``` +MsgLeaseStartReclaim { + id: LeaseID // the lease to reclaim + reason: LeaseClosedReason // must be in provider range (10000-19999) +} +``` + +**Signer:** `id.Provider` + +**Validation:** +- Lease must exist +- Lease must be in `LeaseActive` state +- Lease must have a `Reclamation` record (non-nil) -- otherwise `ErrLeaseNotReclamable` +- Reclamation must not have been started already (`Reclamation.StartedAt == 0`) -- otherwise + `ErrLeaseAlreadyReclaiming` +- Reason must be in provider range (10000-19999) + +**Effects:** +1. `Lease.State` transitions from `LeaseActive` to `LeaseReclaiming` +2. `Lease.Reclamation.StartedAt` is set to the current block height +3. `Lease.Reclamation.Deadline` is set to `block_time + Reclamation.Window` (unix timestamp) +4. `Lease.Reclamation.Reason` is set to the provided reason +5. `EventLeaseReclaimStarted` is emitted + +### Window Enforcement + +#### Provider Close During Reclamation + +When a provider sends `MsgCloseBid` on a lease with reclamation: + +- If the lease is in `LeaseActive` state and has reclamation configured, the close is rejected with + `ErrReclamationNotStarted`. The provider must first send `MsgLeaseStartReclaim`. + +- If the lease is in `LeaseReclaiming` state and the current block time is before the reclamation + deadline, the close is rejected with `ErrReclamationWindowNotElapsed`. + +- If the lease is in `LeaseReclaiming` state and the current block time is at or past the reclamation + deadline, the close proceeds normally (existing `CloseBid` cascade: group paused, order closed, + bid closed, lease closed, escrow payment closed). + +#### Tenant Close During Reclamation + +The tenant can always close the lease via `MsgCloseLease`, regardless of the reclamation state. If the +lease is in `LeaseReclaiming`, it transitions directly to `LeaseClosed`. + +The existing `CloseLease` auto-re-order behavior applies: the deployment group remains open and a new +order is automatically created (with the reclamation requirement inherited from the original order). +This differs from provider-initiated close (see [After Reclamation Close](#after-reclamation-close)), +which pauses the group and does not auto-re-order. + +#### Escrow-Initiated Close During Reclamation + +If the deployment's escrow account is closed (insufficient funds or `MsgCloseDeployment`) while a lease +is in `LeaseReclaiming`, the lease is closed immediately, bypassing the reclamation window. This is +intentional: if funds run out or the tenant voluntarily closes their deployment, reclamation is moot. + +#### Payments During Reclamation + +The provider continues serving workloads and receiving payment at the full lease rate throughout the +reclamation window. The escrow payment stream is not modified. The provider can still call +`MsgWithdrawLease` to withdraw accrued funds during reclamation. + +### After Reclamation Close + +When the provider closes a lease after the reclamation window via `MsgCloseBid`, the existing `CloseBid` +cascade applies: + +1. `Deployment.OnBidClosed` is called, which **pauses** the deployment group +2. The lease, bid, and order are closed +3. The escrow payment is closed + +This is identical to the current `MsgCloseBid` behavior. No automatic re-ordering occurs. The tenant +must manually call `MsgStartGroup` to re-open bidding for the group. + +Note: After a reclamation-initiated close, the lease carries two reason fields: +- `Lease.Reclamation.Reason` -- the reason the provider gave when starting reclamation + (set by `MsgLeaseStartReclaim`) +- `Lease.Reason` -- the reason attached to the final close (set by `MsgCloseBid`, passed through + `OnLeaseClosed`) + +These serve different purposes. The reclamation reason explains *why the provider initiated the +grace period*. The close reason explains *why the bid was ultimately closed*. They may differ +(e.g., reclamation started for maintenance, bid closed for decommissioning). + +### Governance Parameters + +Two new parameters are added to the market module's `Params`: + +| Parameter | Type | Default | Description | +|--------------------------|------------|------------------|---------------------------------------------| +| `min_reclamation_window` | `Duration` | `1h` | Minimum reclamation window duration allowed | +| `max_reclamation_window` | `Duration` | `720h` (30 days) | Maximum reclamation window duration allowed | + +Constraints: +- `min_reclamation_window` must be > 0 +- `max_reclamation_window` must be > `min_reclamation_window` + +Both the tenant's `min_window` requirement (validated at deployment creation) and the provider's +offered `reclamation_window` (validated at bid creation) must fall within these bounds. + +### Events + +A new event type is introduced: + +``` +EventLeaseReclaimStarted { + id: LeaseID + reason: LeaseClosedReason + deadline: int64 // unix timestamp when the reclamation window expires +} +``` + +### Query Support + +The new `LeaseReclaiming` state is automatically queryable through the existing lease query infrastructure: + +- The lease state index (secondary index on `int32(state)`) automatically includes the new state value +- `LeaseFilters.State` accepts the new enum value to filter for leases in reclamation (the exact + string representation depends on proto enum registration -- typically the proto field name + `"reclaiming"` or the numeric value `4`) +- The "all leases" query (no state filter) includes reclaiming leases + +### Reclamation Data Model + +#### Deployment-Level Configuration + +``` +DeploymentReclamation { + min_window: Duration // minimum reclamation window the tenant requires +} +``` + +Stored on: +- `MsgCreateDeployment.reclamation` (input) +- `Deployment.reclamation` (persisted for `StartGroup` re-order) +- `Order.reclamation` (propagated for bid validation) + +#### Lease-Level State + +``` +Reclamation { + window: Duration // negotiated window duration (from winning bid) + started_at: int64 // block height when reclamation started (0 = not started) + deadline: int64 // unix timestamp when window expires (0 = not started) + reason: LeaseClosedReason // provider's stated reason +} +``` + +Stored on `Lease.reclamation` (nullable; nil means no reclamation configured). + +### Error Codes + +#### Market Module (`x/market`) + +| Error | Description | +|----------------------------------|---------------------------------------------------------------| +| `ErrLeaseNotReclamable` | Lease does not have reclamation configured | +| `ErrLeaseAlreadyReclaiming` | Reclamation has already been started on this lease | +| `ErrReclamationNotStarted` | Provider must call `MsgLeaseStartReclaim` before closing | +| `ErrReclamationWindowNotElapsed` | Reclamation window has not yet expired | +| `ErrReclamationWindowInvalid` | Window duration outside governance bounds | +| `ErrReclamationRequired` | Order requires reclamation but bid does not offer it | +| `ErrReclamationWindowTooShort` | Provider's offered window is shorter than the order's minimum | + +#### Deployment Module (`x/deployment`) + +| Error | Description | +|-------------------------|--------------------------------------------------------------------------------------| +| `ErrInvalidReclamation` | Reclamation `min_window` is invalid (zero, below min, or above max governance bound) | diff --git a/src/content/aeps/aep-83/README.md b/src/content/aeps/aep-83/README.md new file mode 100644 index 000000000..ef87581f7 --- /dev/null +++ b/src/content/aeps/aep-83/README.md @@ -0,0 +1,351 @@ +--- +aep: 83 +title: "Confidential Compute via Kata Containers" +author: Joao Luna (@cloud-j-luna) +status: Draft +type: Standard +category: Core +created: 2026-04-14 +requires: 29, 65 +roadmap: major +--- + +## Summary + +This AEP defines how tenants request confidential computing workloads on Akash Network and how providers advertise confidential compute capabilities. It introduces minimal SDL changes, leveraging the existing placement attributes mechanism to match tenants with TEE-capable providers that run workloads inside Kata Containers (micro-VMs). It covers both CPU-only and GPU confidential computing, including NVIDIA GPU passthrough to Kata VMs and composite attestation of CPU + GPU TEEs. + +## Motivation + +[AEP-65](../aep-65) establishes the rationale for Confidential Computing on Akash and recommends Kata Containers (micro-VMs) as the execution model. [AEP-29](../aep-29) covers hardware attestation. What remains undefined is: + +1. How tenants express "I need confidential compute" in their SDL. +2. How providers advertise Kata/TEE capability so the marketplace can match bids. +3. How the provider runtime selects Kata Containers for those workloads. +4. How GPU confidential computing (NVIDIA CC-on mode) integrates with the Kata VM model. +5. How composite attestation works when both CPU and GPU TEEs are involved. + +This AEP addresses all five, with the vision of **minimal SDL and UX changes. + +## Design Principles + +- **Minimal SDL surface**: Reuse existing SDL constructs (placement attributes) rather than inventing new top-level sections. +- **Opt-in**: Only workloads that explicitly request confidential compute run in Kata. All other workloads are unaffected. +- **Provider-side transparency**: Providers advertise capability; the runtime class selection (default vs kata) is handled automatically by the provider software. +- **GPU as a first-class concern**: GPU confidential computing is not an afterthought. The spec addresses GPU passthrough, CC-on mode, and composite attestation as core components. + +## Specification + +### 1. SDL Changes + +A tenant requests confidential compute by adding a single attribute to their placement profile: + +```yaml +attributes: + confidential-compute: true +``` + +This is the **only change** compared to a regular deployment. The attribute acts as a filter: only providers that advertise `confidential-compute: true` will bid on the order. No new SDL keywords, sections, or schema versions are required. + +Whether the workload gets CPU-only or CPU+GPU confidential computing is determined by the compute profile: if the profile includes GPU resources, the provider automatically uses a CC-on GPU and the combined Kata runtime class. No separate GPU attribute is needed — a GPU TEE is meaningless without a CPU TEE (the GPU relies on the CPU TEE as its trust anchor), so the two are inseparable. + +#### CPU-Only Confidential Compute + +```yaml +profiles: + compute: + web: + resources: + cpu: + units: 2 + memory: + size: "2Gi" + storage: + - size: "10Gi" + placement: + confidential: + attributes: + confidential-compute: true + signedBy: + anyOf: + - akash1 + pricing: + web: + denom: uakt + amount: 1000 + +deployment: + web: + confidential: + profile: web + count: 1 +``` + +#### GPU Confidential Compute + +The only difference from the CPU-only case is that the compute profile includes GPU resources. The same `confidential-compute: true` attribute is used. + +```yaml +profiles: + compute: + inference: + resources: + cpu: + units: 8 + memory: + size: "32Gi" + storage: + - size: "100Gi" + gpu: + units: 1 + attributes: + vendor: + nvidia: + - model: h100 + placement: + confidential-gpu: + attributes: + confidential-compute: true + signedBy: + anyOf: + - akash1 + pricing: + inference: + denom: uakt + amount: 5000 + +deployment: + inference: + confidential-gpu: + profile: inference + count: 1 +``` + +Tenants that need a specific CPU TEE technology can add a finer-grained attribute: + +```yaml +attributes: + confidential-compute: true + confidential-compute-tee: intel-tdx # or amd-sev-snp +``` + +Since attestation is vendor-specific (different device nodes, quote formats, and verification flows for TDX vs SEV-SNP), tenants running their own attestation logic will typically need to specify the CPU TEE technology. + +### 2. Provider Attributes + +Providers that support confidential compute must advertise the following attributes: + +| Attribute | Value | When to Advertise | +|---|---|---| +| `confidential-compute` | `true` | Kata Containers runtime installed + TEE-capable CPU detected | +| `confidential-compute-tee` | `intel-tdx` or `amd-sev-snp` | Always (specific CPU TEE technology available) | + +These attributes should be set automatically by the provider's inventory service (per [AEP-41](../aep-41)) when it detects the relevant hardware and runtime configuration. The inventory service must verify: + +1. Kata Containers runtime is installed and registered as a Kubernetes `RuntimeClass`. +2. CPU TEE is enabled (TDX or SEV-SNP active in BIOS and kernel). +3. For GPU nodes: NVIDIA GPU(s) are in CC-on mode and VFIO passthrough is configured. + +GPU confidential capability is not advertised as a separate attribute. Instead, the provider's inventory already advertises GPU models (e.g., `nvidia-h100`). When a tenant's order combines `confidential-compute: true` with a GPU compute profile, the marketplace matches against providers that have both `confidential-compute: true` and the requested GPU model. The provider is responsible for ensuring its advertised GPUs are in CC-on mode when `confidential-compute: true` is set. + +### 3. Provider Runtime Selection + +When a provider receives a lease for an order with `confidential-compute: true`, the provider software must select the appropriate Kata runtime class based on whether the order includes GPU resources. + +The Confidential Containers Operator and NVIDIA GPU Operator provide pre-built runtime classes for TEE combinations: + +| RuntimeClass | CPU TEE | GPU | Source | Status | +|---|---|---|---|---| +| `kata-qemu-tdx` | Intel TDX | No | CoCo Operator | Available | +| `kata-qemu-snp` | AMD SEV-SNP | No | CoCo Operator | Available | +| `kata-qemu-nvidia-gpu` | None | Yes | GPU Operator | Available | +| `kata-qemu-nvidia-gpu-snp` | AMD SEV-SNP | Yes | GPU Operator | Available | +| `kata-qemu-nvidia-gpu-tdx` | Intel TDX | Yes | GPU Operator | Not yet shipped (as of GPU Operator v25.3.1) | + +Intel TDX + GPU passthrough is technically feasible — the SPDM/bounce buffer mechanism is identical to the SEV-SNP path, and NVIDIA documents TDX GPU confidential computing in their standalone SecureAI deployment guide. However, the NVIDIA GPU Operator does not yet ship the `kata-qemu-nvidia-gpu-tdx` runtime class. Some NVIDIA documentation references it by name, suggesting it is in progress. Until it ships, providers with Intel TDX + GPU hardware would need to configure the runtime class manually. + +The provider software selects the runtime class based on the lease: + +1. `confidential-compute: true` + no GPU in compute profile: use the CPU-only class (`kata-qemu-tdx` or `kata-qemu-snp`). +2. `confidential-compute: true` + GPU in compute profile: use the combined class (`kata-qemu-nvidia-gpu-snp`, or `kata-qemu-nvidia-gpu-tdx` when available). + +The provider injects the `runtimeClassName` into the pod spec: + +```yaml +spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp + containers: + - name: inference + image: +``` + +No tenant-side container image changes are required. + +### 4. NVIDIA GPU Passthrough to Kata VMs + +For GPU confidential workloads, the NVIDIA GPU is passed through to the Kata micro-VM via VFIO. This is a provider-side concern managed by the NVIDIA GPU Operator, which provides three components for Kata support: + +- **NVIDIA VFIO Manager**: Loads the `vfio-pci` driver and binds it to GPUs on the node. +- **NVIDIA Sandbox Device Plugin**: Discovers passthrough-capable GPUs and advertises them to kubelet. +- **NVIDIA Kata Manager**: Provides optimized kernel images and initrd for the guest VM. + +GPU passthrough is cold-plug only. The GPU is attached at VM launch time during pod sandbox creation, before the container starts. Inside the guest VM, the NVIDIA kernel modules are loaded and the GPU is initialized on the virtual PCI bus. + +#### Multi-GPU Passthrough + +Kata Containers supports passing multiple GPUs to a single micro-VM via VFIO. Each GPU is attached to its own virtual PCIe root port using the `cold_plug_vfio = root-port` configuration. There is no Kata-side limit on the number of VFIO devices per VM. + +Note: the NVIDIA GPU Operator's Sandbox Device Plugin currently limits allocation to a single GPU per pod for standard PCIe-attached GPUs. On NVSwitch-based HGX systems (Hopper SXM, Blackwell), the GPU Operator supports multi-GPU passthrough (all 8 GPUs + 4 NVSwitches passed as a unit). Akash providers may need to configure GPU allocation outside the GPU Operator's device plugin to support multi-GPU pods on non-HGX systems. + +#### Limitations + +- vGPU (virtual GPU) is not supported. Only full physical GPU passthrough. +- The host must have IOMMU enabled and PCI Access Control Services (ACS) configured. + +### 5. NVIDIA CC-On Mode + +For GPU memory and computation to be protected, the NVIDIA GPU must be running in **Confidential Computing mode (CC-on)**. This is a provider-side hardware configuration. + +#### What CC-On Mode Protects + +- **Data in transit (PCIe bus)**: All data crossing the PCIe bus between CPU and GPU is encrypted using AES-GCM-256. This includes CUDA kernels, command buffers, synchronization primitives, and all DMA transfers. +- **GPU execution state**: Performance counters are disabled to prevent side-channel attacks. All internal firewalls are active. +- **GPU memory (HBM)**: On-package HBM is considered physically secure and is not encrypted, allowing full-speed computation. + +#### CC-On Mode Configuration + +CC-on mode is stored in an EEPROM on the GPU and persists across reboots. Providers enable it using NVIDIA's `gpu-admin-tools`: + +```bash +sudo python3 nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch +``` + +A GPU reset (PF-FLR) is required after changing the mode. The mode can be queried with `--query-cc-mode`. + +NVIDIA defines three modes: + +| Mode | Description | +|---|---| +| `off` | Default. No CC features. | +| `on` | Full CC: bus encryption active, performance counters disabled, all firewalls active. | +| `devtools` | Encryption enabled but performance counters accessible for profiling/debugging. | + +Providers advertising `confidential-compute: true` with GPUs must have those GPUs in `on` mode (not `devtools`). + +#### CPU-GPU Secure Channel + +On Hopper GPUs (H100/H200), the CPU TEE and GPU communicate through a **bounce buffer** in shared system memory: + +1. The NVIDIA driver allocates a bounce buffer in shared (unprotected) system memory. +2. Data leaving the CPU TEE (TDX TD or SEV-SNP VM) is encrypted with a session key negotiated via SPDM and placed in the bounce buffer. +3. The GPU's DMA engine reads and decrypts the data into GPU HBM. +4. The reverse path applies for GPU-to-CPU transfers. + +This bounce buffer approach has a throughput ceiling of approximately 4 GB/sec due to CPU-side encryption overhead. + +Future Blackwell GPUs will eliminate this bottleneck using **TDISP (TEE Device Interface Security Protocol)** and **PCIe IDE (Integrity and Data Encryption)**, which provide hardware-level inline encryption on the PCIe bus. This requires Intel TDX Connect (Xeon 6) or AMD SEV-TIO on the CPU side. + +### 6. Composite Attestation (CPU + GPU) + +When a workload runs with both CPU and GPU TEEs, the tenant must verify **both** TEEs. This is achieved through composite attestation, where the CPU TEE serves as the trust anchor for the GPU. The GPU is not a standalone TEE — it relies on the CPU TEE to establish the secure channel (SPDM session) and to orchestrate the attestation flow. + +#### Attestation Flow + +1. From within the Kata VM, the tenant collects CPU TEE evidence (TDX quote via `/dev/tdx-attest` or SEV-SNP report via `/dev/sev-guest`). +2. The NVIDIA in-guest driver collects GPU attestation evidence via the NVTrust SDK. +3. Both pieces of evidence are sent to **Intel Trust Authority** as a single composite attestation request. +4. Intel Trust Authority verifies the CPU quote independently. +5. Intel Trust Authority forwards the GPU evidence to **NVIDIA Remote Attestation Service (NRAS)**. +6. NRAS verifies the GPU evidence against golden measurements from NVIDIA's Reference Integrity Manifest (RIM) service and returns a signed JWT. +7. Intel Trust Authority validates the NRAS JWT and returns a **composite attestation token** containing claims for both TEEs. + +The composite token is a JWT with two sub-objects: +- `intel_tee`: TDX attestation claims (or `amd_sev_snp` for AMD). +- `nvidia_gpu`: GPU attestation claims (the verified NRAS result). + +#### Platform Support for Composite Attestation + +| CPU TEE | GPU TEE | Composite Attestation Status | +|---|---|---| +| Intel TDX | NVIDIA NVTrust | Generally Available (Intel Trust Authority) | +| AMD SEV-SNP | NVIDIA NVTrust | Preview (Intel Trust Authority Pilot environment) | + +For AMD SEV-SNP, GPU attestation can be performed independently via NRAS while composite attestation matures to GA. + +#### Guest Pre-Start Hook + +The NVIDIA GPU Operator supports running attestation as a **container guest pre-start hook** within the Kata initrd. This means the GPU can be attested automatically before the tenant's container starts, providing a "fail-closed" model where the container never runs if attestation fails. Providers should enable this hook for confidential GPU workloads. + +## Rationale + +### Why placement attributes instead of a new SDL field? + +The SDL already has a well-understood mechanism for matching tenant requirements to provider capabilities: placement attributes. Adding a dedicated `confidential: true` field to compute profiles or services would require: + +- SDL schema changes and version bump +- Client library updates (akashjs, chain SDK) +- Console UI changes to parse the new field + +Using placement attributes avoids all of this. It works with the existing SDL parser, marketplace matching logic, and provider bid filtering, with zero code changes to core SDL handling. + +### Why a single `confidential-compute` attribute for both CPU and GPU? + +A GPU TEE is not a standalone security boundary. The NVIDIA GPU relies on the CPU TEE (TDX TD or SEV-SNP VM) as its trust anchor: the SPDM session is established from the CPU TEE, the attestation flow originates inside the confidential VM, and the bounce buffer encryption keys are negotiated by the CPU-side driver running within the TEE. There is no valid configuration where GPU confidential computing operates without CPU confidential computing. + +Separating them into two attributes (`confidential-compute` + `confidential-compute-gpu`) would: +- Allow tenants to express an invalid state (GPU CC without CPU CC). +- Require validation rules to enforce that GPU implies CPU. +- Add marketplace complexity for no functional benefit. + +Instead, `confidential-compute: true` means "run in a Kata VM with CPU TEE." When the compute profile also includes GPU resources, the provider automatically selects the combined runtime class and ensures the GPU is in CC-on mode. The GPU path is an implicit consequence of requesting confidential compute with GPU resources, not a separate opt-in. + +### Why Kata Containers? + +As discussed in [AEP-65](../aep-65), Kata Containers provide the best balance of security, compatibility, and operational simplicity: + +- OCI-compatible: tenants keep their existing container workflows. +- CRI-compatible: integrates with Kubernetes via `RuntimeClass` without a separate orchestrator. +- Each container runs in its own micro-VM with dedicated kernel, providing strong isolation. +- TEE protection (TDX/SEV) applies at the VM boundary, securing memory and execution state. +- NVIDIA officially supports GPU passthrough to Kata VMs via the GPU Operator, with dedicated runtime classes for each TEE combination. + +## Backward Compatibility + +This proposal is fully backward compatible: + +- Existing SDLs without the `confidential-compute` attribute continue to work unchanged. +- Existing providers without Kata support simply will not bid on confidential compute orders. +- No SDL version bump is required. +- No on-chain parameter changes are required. + +## Security Considerations + +- **Attestation is critical**: Deploying with `confidential-compute: true` without performing attestation only guarantees Kata VM isolation, not that the workload is running inside a genuine TEE. Tenants must always attest from within the VM. For GPU workloads, the guest pre-start hook should be used for fail-closed attestation before the container starts. +- **Provider attribute trust**: The `confidential-compute` attribute should be verified by auditors or set automatically by the inventory service ([AEP-41](../aep-41)) rather than self-reported by providers, to prevent false capability claims. +- **CC-on mode enforcement**: Providers must not run GPUs in `devtools` mode for confidential workloads. The `devtools` mode enables performance counters that could leak information via side channels. +- **Bounce buffer overhead**: On Hopper GPUs (H100/H200), the CPU-GPU secure channel uses a software bounce buffer with ~4 GB/sec throughput. Tenants with high-bandwidth CPU-GPU transfer needs should be aware of this limitation. Future Blackwell GPUs with TDISP/PCIe IDE will remove this bottleneck. +- **Device passthrough scope**: TEE device nodes and GPUs are passed through to the Kata VM boundary, not directly to the host. This maintains host isolation while enabling attestation and GPU compute within the enclave. +- **NVLink**: On Hopper GPUs, data transmitted over NVLink between GPUs is not encrypted. Multi-GPU confidential workloads on NVLink-connected systems should consider this. NVLink encryption is introduced with Blackwell. + +--- + +## Upstream Tracking Notes + +The following upstream issues and PRs are relevant to the implementation of this AEP and should be monitored: + +### Intel TDX + GPU Runtime Class (`kata-qemu-nvidia-gpu-tdx`) + +The Kata Containers runtime has experimental support for TDX+GPU, but the NVIDIA GPU Operator does not yet ship the runtime class. The active blocker is CDI spec generation for TDX's iommufd device paths. + +| Item | Status | Link | +|---|---|---| +| Kata PR #10867: GPU QEMU SNP+TDX experimental updates | Merged (Feb 2025) | https://github.com/kata-containers/kata-containers/pull/10867 | +| Kata PR #10868: QEMU TDX experimental workflow | Merged (Feb 2025) | https://github.com/kata-containers/kata-containers/pull/10868 | +| Kata PR #11568: Add proper TDX config path for GPU | Merged (Jul 2025) | https://github.com/kata-containers/kata-containers/pull/11568 | +| Kata Issue #11721: TDX VM + GPU VFIO_MAP_DMA failure | Closed (redirected to k8s-kata-manager) | https://github.com/kata-containers/kata-containers/issues/11721 | +| **k8s-kata-manager Issue #133: CDI spec for TDX+GPU (blocker)** | **Open** | https://github.com/NVIDIA/k8s-kata-manager/issues/133 | + +### Confidential Containers + GPU Roadmap + +| Item | Status | Link | +|---|---|---| +| CoCo Issue #278: Road to Confidential Containers with GPUs | Open (umbrella tracker) | https://github.com/confidential-containers/confidential-containers/issues/278 | diff --git a/src/content/aeps/aep-84/README.md b/src/content/aeps/aep-84/README.md new file mode 100644 index 000000000..4f7dddc6c --- /dev/null +++ b/src/content/aeps/aep-84/README.md @@ -0,0 +1,121 @@ +--- +aep: 84 +title: "Console Split: Managed Platform and Self-Custodial Air" +description: "Split Akash Console into a fully-managed platform (console.akash.network) and a dedicated self-custodial app (Console Air)" +author: Maxime Beauchamp (@baktun14) Greg Osuri (@gosuri) +status: Final +type: Standard +category: Interface +created: 2026-04-24 +estimated-completion: 2026-05-31 +roadmap: major +--- + +## Motivation + +Akash Console today tries to serve two fundamentally different users through a single application: + +1. **Self-custodial users** who connect a Keplr wallet, sign their own transactions, and own their on-chain identity. +2. **Fully managed users** who pay with a credit card and never need to create or manage a crypto wallet. + +Console started as a wallet-only product. Credit card support was added because requiring a wallet created an enormous amount of friction for mainstream developers, and offering both paths has genuinely helped users who spread workloads across multiple platforms. + +However, blending both experiences in a single application has become a liability: + +- **User confusion.** The two paths imply different identities, different recovery models, different billing surfaces, and different trust assumptions. New users routinely struggle to understand which path applies to them, and existing users are forced to reason about concepts (wallets, credits, escrow, credit cards) that are only relevant to half of the audience. +- **Product complexity.** Almost every new feature has to be designed, implemented, tested, and documented twice — once for the wallet identity model and once for the managed identity model. This has significantly slowed feature velocity and widened the surface area for bugs. +- **Misaligned usage.** Over 85% of spend on Console today comes from credit card users. Keeping the wallet option inside the managed experience optimizes the product for a small fraction of actual spend while burdening the majority. + +At the same time, the self-custodial, permissionless nature of Akash is a core property of the network that must be preserved. The answer is not to remove the wallet path — it is to give it a home where it can be treated as a first-class experience rather than an alternate mode inside a managed product. + +## Summary + +We propose splitting Akash Console into two dedicated applications, each optimized for a single identity model: + +1. **console.akash.network** — the fully managed platform. No wallet. Credit card billing. Optimized for the lowest possible friction and the broadest possible developer audience. +2. **Console Air** — the fully self-custodial application. Wallet-only (Keplr and compatible wallets). No managed billing. Optimized for users who want to own their keys, sign their own transactions, and interact with Akash permissionlessly. + +Each application has a single identity model, a single billing model, and a single mental model for the user. Features are designed once, for the audience that actually uses them. + +## Proposed Solution + +### console.akash.network (Managed Platform) + +- **Identity:** email + password (and/or SSO), backed by the existing managed-wallet infrastructure established in [AEP-63](../aep-63). +- **Payments:** credit card only, including the credit and auto-reload features from [AEP-31](../aep-31), [AEP-72](../aep-72), and [AEP-74](../aep-74). +- **Scope removed:** Keplr connect, manual AKT balance, wallet-signed transactions, wallet-based deployment history. +- **Scope preserved and expanded:** trial credits, auto credit reload, billing & usage, alerts, custom domains, and all other managed features currently on Console's roadmap. +- **Evolution:** continues to evolve as a fully managed, opinionated platform — the front door for developers who want Akash's price/performance without Akash's crypto surface area. + +### Console Air (Self-Custodial) + +- **Identity:** Keplr (and compatible Cosmos wallets) only. Users sign every transaction. +- **Payments:** AKT (and any future on-chain denominations) paid directly from the user's wallet to on-chain escrow. No credit card, no managed balance, no off-chain billing. +- **Scope removed:** email sign-up, credit card payments, managed trial credits, server-side account state that isn't derivable from on-chain data. +- **Scope preserved:** full deployment lifecycle (SDL, bid, lease, manifest, logs, shell, updates, close), provider selection, certificates, multi-depositor escrow ([AEP-75](../aep-75)), and any future on-chain features that require direct wallet signing. +- **Positioning:** the canonical reference client for permissionless Akash usage. Open to any wallet and any provider, with no gatekeeping layer between the user and the chain. + +### Migration + +- Existing managed (credit card) users continue on console.akash.network with no action required. The domain and account system stay the same. +- Existing self-custodial users on console.akash.network are guided to Console Air. For a transition period, console.akash.network will display a clear banner for any user arriving with a connected Keplr wallet, directing them to Console Air with a one-click handoff that preserves the connected address. +- The wallet-connect code path in console.akash.network is removed after the transition period ends. +- Documentation, tutorials, and all external links are updated to point to the appropriate application based on audience. + +### Shared Infrastructure + +The two applications live in separate repositories — [akash-network/console](https://github.com/akash-network/console) for the managed platform and [akash-network/console-air](https://github.com/akash-network/console-air) for the self-custodial app — and share code through published `@akashnetwork/*` packages: + +- SDL editor, deployment lifecycle UI components, and provider selection logic via shared UI packages. +- Common design system and component library (`@akashnetwork/ui`). +- The Console API layer ([AEP-63](../aep-63), [AEP-69](../aep-69), [AEP-70](../aep-70)) for features that apply to both audiences (e.g., provider data, pricing). +- Chain SDK, network utilities, and HTTP clients for talking to Akash. + +What is *not* shared is the identity, billing, and account management surface. Those diverge cleanly between the two apps and are no longer forced into a single abstraction. Splitting at the repository boundary (rather than within a monorepo) reinforces the separation: each app's dependencies, CI, release cadence, and contributor model can evolve independently of the other. + +## Rationale + +### Why split rather than hide + +We considered keeping a single application and hiding the wallet path behind a feature flag or a hidden route. This fails the core goal: a single app still forces every feature designer, PM, and engineer to reason about both identity models when making changes, even if one is rarely surfaced in the UI. A clean split aligns the codebase, the surface area, and the team with the audience. + +### Why keep Console Air rather than deprecate it + +The self-custodial path is a defining property of Akash. Removing it from Console entirely — without an obvious replacement — would signal that Akash is retreating from its permissionless roots. Console Air preserves and in fact strengthens that path by giving it a dedicated, uncompromised home. + +### Why not a third-party wallet-only client + +A first-party, open-source self-custodial client ensures that self-custodial usage of Akash remains practical and well-supported regardless of third-party interest. Existing wallet users on Console already depend on us for this experience; they deserve a smooth transition rather than a hand-off to an unmaintained alternative. + +### Naming + +"Console Air" signals a lightweight, no-overhead, pure self-custodial experience, in contrast to the full managed platform at console.akash.network. + +## Backward Compatibility + +- Credit card / managed users: no change. Same URL, same accounts, same billing. +- Wallet users on console.akash.network: temporarily supported with a redirect/handoff flow to Console Air, then removed after the transition period. No on-chain state is affected; users retain full access to their deployments via their wallet in Console Air, the Akash CLI, or any other self-custodial client. +- API consumers: unchanged. The Console API ([AEP-69](../aep-69), [AEP-70](../aep-70)) continues to serve both applications. + +## Security Considerations + +- **Reduced attack surface on the managed side.** Removing wallet-signing code paths from console.akash.network eliminates an entire class of phishing and transaction-injection risks for the managed audience, which today has no reason to sign on-chain transactions. +- **Reduced attack surface on the self-custodial side.** Removing managed-billing code paths from Console Air removes server-side account, session, and payment logic from the self-custodial audience, shrinking the trust surface of the application they rely on for signing. +- **Clear trust model per app.** Each application has a single, documented trust model, which makes security review and user education substantially simpler. +- **Transition redirect.** The handoff from console.akash.network to Console Air for wallet users must only redirect to a verified, first-party domain to avoid being used as a phishing vector. + +## Implementations + +- Akash Console: [github.com/akash-network/console](https://github.com/akash-network/console) +- Console Air: [github.com/akash-network/console-air](https://github.com/akash-network/console-air) + +## References + +- [AEP-31: Credit Card Payments In Console](../aep-31) +- [AEP-63: Console API for Managed Wallet Users](../aep-63) +- [AEP-72: Console - Improved User Onboarding](../aep-72) +- [AEP-74: Console - Auto Credit Reload](../aep-74) + +## Copyright + +Copyright and related rights waived via [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-85/README.md b/src/content/aeps/aep-85/README.md new file mode 100644 index 000000000..2c8035b4b --- /dev/null +++ b/src/content/aeps/aep-85/README.md @@ -0,0 +1,176 @@ +--- +aep: 85 +title: "Console: Buildpacks-Powered Deployments" +description: "Replace the broken Build & Deploy feature with a Cloud Native Buildpacks pipeline that supports any language" +author: Maxime Beauchamp (@baktun14) +status: Draft +type: Standard +category: Interface +created: 2026-04-24 +estimated-completion: 2026-09-30 +discussions-to: https://github.com/orgs/akash-network/discussions +roadmap: major +replaces: 28 +--- + +## Abstract + +This AEP proposes **replacing the current "Build & Deploy" feature in Akash Console with a Cloud Native Buildpacks** ([buildpacks.io](https://buildpacks.io)) pipeline. The existing flow — referenced via `CI_CD_TEMPLATE_ID` in [`apps/deploy-web/src/config/remote-deploy.config.ts`](apps/deploy-web/src/config/remote-deploy.config.ts) — is in a broken state, hard-coded to a small set of JavaScript web frameworks, and relies on language detection that only reads `package.json`. Anyone deploying a Python, Go, Ruby, Java, Rust, .NET, or PHP application currently has to abandon the flow and produce a Dockerfile manually. + +The proposal swaps the underlying runner image and the configuration UI in `RemoteRepositoryDeployManager` for a CNB-based stack. A new `awesome-akash` template ships an Akash-native runner image (`cnb-runtime`) built on the Paketo or Heroku builder. At container start, the runner clones the user's repository, invokes `pack build`, and executes the produced launcher. Language is auto-detected from manifest files (`requirements.txt`, `go.mod`, `Gemfile`, `pom.xml`, `Cargo.toml`, `composer.json`, `*.csproj`, `package.json`). No Dockerfile, no local Docker daemon, and no registry credentials are required of the user. + +The change is scoped to Console (`apps/deploy-web`) and the `awesome-akash` repository. No changes to chain protocol, provider services, or Economics. + +## Motivation + +The existing "Build & Deploy" feature in Console — introduced by [AEP-28 ("Auto Deployment from VCS")](../aep-28) — is the front door for first-time deployers who have a Git repository but no container experience. Today that door is broken: the runtime CI/CD template is brittle, the Console-side configuration form is hard-coded to a small JavaScript-framework allowlist (`supportedFrameworks` in [`remote-deploy.config.ts`](apps/deploy-web/src/config/remote-deploy.config.ts)), and the framework-detection hook only inspects `package.json` (see [`useRemoteDeployFramework`](apps/deploy-web/src/components/remote-deploy/hooks/useRemoteDeployFramework.tsx)). Anyone arriving with a Python, Go, Ruby, Java, Rust, .NET, or PHP project hits a dead end. + +Cloud Native Buildpacks are the open-standard way Heroku, Cloud Foundry, Google Cloud Run, and Fly.io solve the same problem. They auto-detect the language, fetch the right toolchain, and produce a runnable OCI image without a Dockerfile. Adopting CNB in Console gives Akash Network multi-language Git-to-deploy parity with leading hyperscalers at a fraction of the engineering cost — and lets us retire a known-broken feature instead of patching it. + +This AEP follows AEP-72 ("Console - Improved User Onboarding") in spirit: reduce friction for first-time deployers, broaden the set of workloads that "just work" on Akash, and lean on common SaaS/CSP paradigms rather than asking users to learn Akash-specific concepts before they see any value. + +## Specification + +### Scope + +In scope: +- Console UI (`apps/deploy-web`): replace the existing build configuration form, replace the language-detection hook, point the existing "Build and Deploy" entry at the new template. +- Console config: introduce `BUILDPACKS_TEMPLATE_ID` and `BP_*` protected environment variables; remove or deprecate JavaScript-specific config that no longer applies. +- New runner image and template in `akash-network/awesome-akash`; deprecate the existing CI/CD template. + +Out of scope: +- Chain protocol changes. +- Provider-side changes. +- Hosted builder service (no new app under `apps/`). +- Image registry, GHCR push, or two-stage build/deploy orchestration. +- Standalone CLI tool. + +### A. Console UI changes (`apps/deploy-web`) + +**A.1. Config replacement.** [`apps/deploy-web/src/config/remote-deploy.config.ts`](apps/deploy-web/src/config/remote-deploy.config.ts) is updated to: + +- Replace `CI_CD_TEMPLATE_ID` (or alias it) with `BUILDPACKS_TEMPLATE_ID = "akash-network-awesome-akash-automatic-deployment-CICD-buildpacks-template"`. The `gitProvider` query param continues to route the user into the same flow; only the underlying template changes. +- Replace the JavaScript-framework-specific keys in `protectedEnvironmentVariables` (`INSTALL_COMMAND`, `BUILD_COMMAND`, `NODE_VERSION`, `BUILD_DIRECTORY`, `CUSTOM_SRC`, `FRONTEND_FOLDER`) with buildpacks-native keys: `BP_BUILDER_IMAGE`, `BP_LANGUAGE`, `BP_LANGUAGE_VERSION`, `BP_PROCFILE`. Repository / branch / token / `DISABLE_PULL` keys remain unchanged — they are language-agnostic and continue to work. +- Replace the JS-only `supportedFrameworks` array with a multi-language `detectedLanguages` table: + + | id | manifest files | default builder | version env | + |---|---|---|---| + | js | `package.json` | `paketobuildpacks/builder-jammy-base` | `BP_NODE_VERSION` | + | py | `requirements.txt`, `pyproject.toml` | `paketobuildpacks/builder-jammy-base` | `BP_CPYTHON_VERSION` | + | go | `go.mod` | `paketobuildpacks/builder-jammy-base` | `BP_GO_VERSION` | + | ruby | `Gemfile` | `paketobuildpacks/builder-jammy-base` | `BP_MRI_VERSION` | + | java | `pom.xml`, `build.gradle` | `paketobuildpacks/builder-jammy-base` | `BP_JVM_VERSION` | + | php | `composer.json` | `paketobuildpacks/builder-jammy-base` | `BP_PHP_VERSION` | + | rust | `Cargo.toml` | `heroku/builder:24` | n/a (toolchain pinned via `rust-toolchain.toml`) | + | dotnet | `*.csproj` | `heroku/builder:24` | `BP_DOTNET_FRAMEWORK_VERSION` | + +**A.2. Component replacements.** + +- **Replace** [`RemoteBuildInstallConfig.tsx`](apps/deploy-web/src/components/remote-deploy/deployment-configurations/RemoteBuildInstallConfig.tsx) with a new `BuildpacksConfig.tsx` in the same directory. The new component keeps the `Collapsible` + grid layout but exposes: detected-language badge, builder image dropdown, single language-version input, optional Procfile override, autobuild checkbox (reuses existing `DISABLE_PULL`). All writes continue through [`EnvVarManagerService`](apps/deploy-web/src/services/remote-deploy/env-var-manager.service.ts). +- **Replace** [`useRemoteDeployFramework`](apps/deploy-web/src/components/remote-deploy/hooks/useRemoteDeployFramework.tsx) with `useBuildpackLanguageDetection`. The new hook hits the git provider's tree API once per `(repo, branch)` using the OAuth token already in the Jotai store, and resolves language by first-match priority over the table above. Returns `{ language, confidence, detectedFiles[] }`. The user can override via the language dropdown, which writes to `BP_LANGUAGE`. + +**A.3. Edited components.** + +- [`RemoteRepositoryDeployManager.tsx`](apps/deploy-web/src/components/remote-deploy/RemoteRepositoryDeployManager.tsx) — swaps the import from `RemoteBuildInstallConfig` to `BuildpacksConfig`. No additional UI affordance ("mode toggle") is added; buildpacks **is** the build pipeline. +- [`NewDeploymentContainer.tsx`](apps/deploy-web/src/components/new-deployment/NewDeploymentContainer/NewDeploymentContainer.tsx) — already detects git-provider templates by ID; the new `BUILDPACKS_TEMPLATE_ID` is recognized via the same mechanism. +- [`TemplateList`](apps/deploy-web/src/components/templates/TemplateList) — the existing **"Build and Deploy"** call-to-action's deep link is updated to point at `BUILDPACKS_TEMPLATE_ID`. The user-visible label and entry point remain unchanged. + +**A.4. Unchanged.** [`sdlGenerator.ts`](apps/deploy-web/src/utils/sdl/sdlGenerator.ts) is unchanged — the SDL is structurally identical; only the `image:` and the env var keys differ. + +### B. `awesome-akash` template + +A new template directory `automatic-deployment-CICD-buildpacks-template/` in [`akash-network/awesome-akash`](https://github.com/akash-network/awesome-akash) containing: + +- **`deploy.yaml`** — SDL referencing `image: ghcr.io/akash-network/cnb-runtime:latest`, with the new `BP_*` env vars. +- **`Dockerfile`** for `cnb-runtime` — `FROM paketobuildpacks/builder-jammy-base`. Entrypoint script: + 1. Clone repository using the existing token-handling logic from the legacy CI/CD entrypoint (port verbatim, no rewrite). + 2. Run `pack build app --builder $BP_BUILDER_IMAGE --env-file `. + 3. Exec `/cnb/lifecycle/launcher` from the produced image. + 4. On `DISABLE_PULL=no`, poll commit hash and rebuild on change (port verbatim). +- **`README.md`** + entry in `awesome-akash` template index so the new template surfaces in the gallery API consumed by [`apps/api/src/template/services/template-gallery/template-gallery.service.ts`](apps/api/src/template/services/template-gallery/template-gallery.service.ts). +- **Mark the legacy `automatic-deployment-CICD-template/` as deprecated** in its README and remove it from the curated template index in a follow-up change after a deprecation window (see Backward Compatibility). + +### C. Telemetry + +Hook into the existing GTM / `dataLayer.push` pipeline (deploy-web standardizes on GTM, not gtag): + +- `buildpacks_deploy_started` — properties: `language`, `builder_image`. +- `buildpacks_language_detected` — properties: `language`, `detected_files_count`, `was_overridden`. +- `buildpacks_deploy_succeeded` / `buildpacks_deploy_failed` — best-effort, emitted from the front end on lease/manifest-status transitions. + +### D. Phased delivery + +| Phase | Scope | Indicative duration | +|---|---|---| +| **1 — MVP / replacement** | Sections A + B + C above; legacy CI/CD template marked deprecated | 2–3 weeks | +| **2 — Polish** | Buildpack layer caching to a persistent volume, per-language version pickers driven by Phase 1 telemetry, Procfile editor refinements; legacy CI/CD template removed from gallery | 3–4 weeks (post-Phase-1 data) | +| **3 — Two-stage build (optional)** | "Build once, run many" mode that pushes the result to GHCR using the user's PAT, deploying a slim runtime image. Triggered only by Phase 2 telemetry showing repeated rebuilds or scaled replicas | Only if data justifies | + +A hosted builder service (a new `apps/builder` app under Console operation) is explicitly **not** part of this AEP. If Phase 3 telemetry indicates demand, it would be proposed as a follow-up AEP. + +## Rationale + +### Why replace, not extend + +The existing CI/CD template is broken in production and locked to a JavaScript-only mental model. Maintaining two parallel pipelines (legacy CI/CD + new buildpacks) would split the user's mental model, double the surface area requiring tests, and leave the broken path live in the gallery. Buildpacks is a strict superset of what the JS-only path tried to do — Paketo's `builder-jammy-base` already handles Node — so keeping the legacy path adds no user value. Replacement also lets us delete the JS-only `supportedFrameworks` allowlist and the `package.json`-only detection hook outright, simplifying the code. + +### Why runtime build instead of two-stage build-then-push + +Three options were considered for where `pack build` executes: + +1. **Runtime build inside the deployed workload** (this proposal) — same architectural pattern the legacy CI/CD template was already using; zero new infrastructure; first-cold-start cost paid by the user's lease, not Akash. +2. **Two-stage Akash deployment** — a "builder" workload runs first, pushes to a registry, then a "deploy" workload pulls. Requires registry credentials, two-step UX, polling state. Cleaner long-term but materially more complex. +3. **Hosted builder service** — Akash runs the build farm. Best UX but largest operational lift (CPU/IO costs, multi-tenant isolation, abuse prevention). + +Option 1 is the smallest shippable thing that proves user demand at parity with — and beyond — the broken predecessor. Options 2 and 3 are deferred behind Phase 1 telemetry. The cost of option 1 (rebuilding on restart) is identical to the legacy CI/CD template, so the trade-off is not a regression. + +### Why Paketo Jammy as the default builder + +Paketo's `builder-jammy-base` covers Node, Python, Go, Java, Ruby, .NET, and PHP out of the box without buildpack configuration. Heroku's `builder:24` is used only where Paketo lacks first-party support (Rust, .NET specifics). Both are vendor-neutral CNCF / Heroku-stewarded images. + +### Why no backend (`apps/api`) changes + +The architectural guardrail is that the replacement adds zero load on Console-operated infrastructure beyond what the legacy flow already incurred. If the team finds itself adding API routes during implementation, that is a signal the proposal has quietly drifted into Option 3 (hosted builder) and warrants a separate AEP. + +## Backward Compatibility + +This is a **replacement**, not an additive change, and therefore has user-visible compatibility impact that must be managed: + +- **Existing deployments** built from the legacy CI/CD template continue to run unchanged. Their lease, manifest, and runtime container are unaffected because they reference the legacy `image:` tag, which is not removed from any registry. +- **The "Build and Deploy" entry point** (`TemplateList` CTA, deep links such as `/new-deployment?step=edit-deployment&gitProvider=github&templateId=…`) starts pointing at `BUILDPACKS_TEMPLATE_ID` after this AEP lands. Users who follow an old link with the legacy `templateId` query parameter still resolve correctly during the deprecation window; after Phase 2 the legacy template is removed from the curated gallery and the deep link 404-style falls back to the buildpacks template. +- **Form-state shape** changes: the env-var keys captured by the wizard differ. Because protected env vars are scoped per template and serialized into the SDL at the moment of deployment, no persisted user state is invalidated. Saved user templates that explicitly reference the legacy `INSTALL_COMMAND` / `BUILD_COMMAND` / `NODE_VERSION` keys continue to work as plain SDL — they bypass the new wizard. +- **No chain or provider compatibility implications.** + +A deprecation notice is posted in the legacy template's README and in Console release notes when this AEP enters Last Call. + +## Test Cases + +Per the project's testing guidelines (CLAUDE.md, `/console-tests` skill): + +- **Unit:** language-detection priority and tie-breaking in `useBuildpackLanguageDetection`; env-var writes from `BuildpacksConfig` through `EnvVarManagerService`; template-ID resolution in `RemoteRepositoryDeployManager`. +- **Component:** `RemoteRepositoryDeployManager` renders `BuildpacksConfig` for the new template ID. +- **E2E (Playwright, [`apps/deploy-web/test/`](apps/deploy-web/test/)):** click "Build and Deploy" on `TemplateList` → connect a public Python repository → SDL renders with `cnb-runtime` image and `BP_LANGUAGE=py`. +- **Manual integration matrix (week 1, gating Phase 1):** spike `pack build` against the Akash provider runtime using three reference repos — a Python Flask app, a Go HTTP server, a Rails app. All three must produce a runnable launcher. Block the rest of Phase 1 if this spike fails (see Security Considerations for fallback paths). + +## Implementations + +Reference implementation lands across two PRs: + +- `akash-network/console` PR — Console UI replacement (Section A). +- `akash-network/awesome-akash` PR — `cnb-runtime` Dockerfile + new template + deprecation marker on the legacy template (Section B). + +A reproducible end-to-end deploy of one Python and one Go repository via the public Console serves as the acceptance criterion for promoting this AEP to Last Call. + +## Security Considerations + +- **Arbitrary code execution by design.** Buildpacks execute scripts contained in the user's repository (e.g., `pip install`, `go build`, `npm install` post-install hooks). This is identical to the legacy CI/CD template's threat model: the build runs inside the user's leased provider workload, not in any Console-operated environment. Console publishes no API surface that executes user code; the `cnb-runtime` image runs only inside a user-paid lease. +- **Provider runtime compatibility.** `pack build` traditionally requires a Docker daemon. The runner image relies on Paketo's daemonless `lifecycle` binaries (or Heroku's equivalent), which work in a standard container without `--privileged`. **A spike in week 1 must verify this on the Akash provider runtime before any Phase 1 work commits.** If daemonless lifecycle is incompatible, the fallback is to constrain the AEP to providers that support `--privileged` deployments and document the limitation. Because this AEP replaces a broken feature, the fallback baseline is "the previous JavaScript flow is still gone" — there is no obligation to keep it alive. +- **OAuth token handling.** The replacement reuses the existing token storage path (Jotai `atomWithStorage` → SDL env var via the protected-env mechanism). No new token surface is introduced. Tokens already flow client-side directly to GitHub/GitLab/Bitbucket APIs without Console proxying; the buildpacks template inherits the same behavior. +- **Builder image supply chain.** `paketobuildpacks/builder-jammy-base` and `heroku/builder:24` are vendor-published, signed CNCF/Heroku images. The `cnb-runtime` wrapper image is published from the `awesome-akash` repository under the `akash-network` org's GHCR namespace; provenance attestation is enabled via GitHub's built-in `attest-build-provenance` action. +- **Builder image cold-start cost.** The Paketo Jammy base image is approximately 1.5 GB. First pull on a previously unseen Akash provider adds 30–90 seconds before `pack build` begins. This is a UX cost, not a security cost, but must be documented in the user-facing copy to set expectations. Phase 2 caching reduces this on subsequent rebuilds within a lease. +- **Resource exhaustion.** Buildpacks builds can be memory-hungry (especially Java). The default resource profile in the new template sets generous defaults (4 GiB RAM, 2 CPU), with explicit guidance in the README to bump for JVM / Gradle workloads. No Console-operated resources are at risk. +- **Multi-language repository ambiguity.** A repository containing both `package.json` and `requirements.txt` is real (e.g., Next.js + Python serverless functions). Detection picks first-match-by-priority and surfaces the choice; the user can override via the `BP_LANGUAGE` field. No silent behavior. + +## Copyright + +All content herein is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/src/content/aeps/aep-86/IMPLEMENTATION.md b/src/content/aeps/aep-86/IMPLEMENTATION.md new file mode 100644 index 000000000..63e017f71 --- /dev/null +++ b/src/content/aeps/aep-86/IMPLEMENTATION.md @@ -0,0 +1,1437 @@ +# AEP-86 Implementation Guide + +> Companion to [AEP-86: Provider Verification Tiers](./README.md) (the authoritative specification). +> +> This document contains **implementation-specific details only**: protobuf definitions, Go interfaces, +> codebase-specific notes, and error codes. It does not repeat specification-level content. +> For the design, behavior, rules, and rationale, see [README.md](./README.md). + +--- + +## 1. Codebase-Specific Implementation Notes + +These notes address concerns specific to the existing Akash codebase that are not covered in the specification. + +### 1.1 Sybil Prevention Note + +[Attestation Submission](./README.md#attestation-submission) step 2 validates `auditor != provider`. However, +an operator could control multiple addresses. Full Sybil prevention beyond address-level checks is handled by +governance approval of auditors -- only governance-approved auditors can submit attestations, and the governance +process is expected to vet auditor independence. + +### 1.2 x/provider Enhancement: Registration Timestamp + +[On-chain prerequisite enforcement](./README.md#on-chain-prerequisite-enforcement) requires the provider's +registration timestamp via `GetRegistrationTime`. This may require a minor addition to `x/provider` if the +timestamp is not currently stored. + +If `x/provider` only stores the provider record without a timestamp, the migration that introduces +`x/verification` must backfill registration times: +- For new providers: store `block_time` of the `MsgCreateProvider` transaction +- For existing providers at migration: set `registered_at` to the upgrade block time (conservative default + that does not grandfather providers into time-based prerequisites they may not have actually met) + +### 1.3 x/market Enhancement: Lease Close Reasons + +The [completion rate calculation](./README.md#cross-module-keeper-interfaces) requires distinguishing lease +close reasons. The `x/market` module currently tracks `LeaseClosedReasonOwner` (in `server.go`). This must +be expanded: + +- `LeaseClosedReasonTenant` -- tenant initiated the close (counts as "completed") +- `LeaseClosedReasonProvider` -- provider initiated the close (counts as "terminated") +- `LeaseClosedReasonInsufficientFunds` -- escrow depleted (counts as "completed") + +Only `LeaseClosedReasonProvider` counts against the provider's completion rate. + +### 1.4 Bond Calculation Unit Conversions + +The [bond calculation formula](./README.md#bond-calculation) uses `ResourceSummary` fields that are in MB, +while governance parameters are per-GB and per-TB. The implementation must apply unit conversions: + +``` +required_bond(tier) = ( + resource_summary.total_gpus * bond_per_gpu[tier] + + resource_summary.total_vcpus * bond_per_vcpu[tier] + + resource_summary.total_memory_mb * bond_per_memory_gb[tier] / 1024 + + resource_summary.total_storage_mb * bond_per_storage_tb[tier] / 1048576 +) +``` + +### 1.5 Bond Posting Without Snapshot Hash + +A provider without a snapshot hash can still post a bond via `MsgPostProviderBond` (the module accepts any +amount). However, auditors cannot submit L2+ attestations until the snapshot hash exists, because +[on-chain prerequisite enforcement](./README.md#on-chain-prerequisite-enforcement) for L2+ requires snapshot +compliance, which requires a snapshot hash record to exist. + +### 1.6 Feature Flag: `verification_module_active` + +During the [migration period](./README.md#migration-from-aep-9), a governance parameter +`verification_module_active` (default: `false`) controls bid-matching behavior in `x/market`: + +- `false`: market module uses `x/audit` for bid matching (current behavior). `VerificationRequirement` + fields in SDL are accepted but not enforced. +- `true`: market module uses `x/verification` for bid matching. `x/audit` matching is retained only + for existing deployments created before activation. + +--- + +## 2. Store Key Design Rationale + +Supplements [Store Layout and Indexing](./README.md#store-layout-and-indexing) with design rationale. + +### 2.1 Key Design Choices + +- **Attestation primary key** is `(provider, auditor)` because the most common query pattern is "get all + attestations for a provider" (prefix scan on `0x02 | provider_addr`). +- **Auditor secondary index** (`0x10`) enables the reverse lookup without duplicating attestation data. +- **Time queues** use big-endian timestamps so that lexicographic ordering equals chronological ordering, + allowing efficient `KVStorePrefixIterator` up to the current block time. +- **Unbonding sequence** (`seq` in `0x23`) allows multiple concurrent unbonding entries for the same provider + (from multiple partial withdrawals). + +### 2.2 Queue Lifecycle + +When inserting a queue entry: +- Attestation submitted -> add to `0x20` with `expires_at` from TTL +- Auditor registered/renewed -> add to `0x21` with `renewal_deadline` +- Snapshot hash posted -> remove old `0x22` entry (if any), add new with `compliance_deadline` +- Bond withdrawal initiated -> add to `0x23` with `completion_time` +- Auditor resignation/lapse -> add to `0x24` with `completion_time` + +When processing a queue entry ([EndBlocker](./README.md#endblocker-design)): +- Process the state transition +- Delete the queue entry +- If the state transition creates a new deadline (e.g., attestation replacement), insert a new queue entry + +### 2.3 Gas Considerations + +Per-block caps prevent unbounded gas usage: +- If a queue has more entries than the cap, remaining entries are processed in subsequent blocks +- Entries are always processed in chronological order (oldest first) +- The caps are governance parameters, adjustable without chain upgrades +- Under normal conditions, most blocks will process zero or very few entries + +--- + +## 3. Go Interface Definitions + +> **Note**: This section is the authoritative reference for code interfaces. The [specification](./README.md) +> contains language-neutral pseudocode for the same interfaces. Where the two diverge, this document takes +> precedence for implementation. + +### 3.1 Cross-Module Keeper Interfaces + +The `x/verification` module requires read-only access to other modules: + +```go +// ProviderKeeper -- from x/provider +type ProviderKeeper interface { + Get(ctx sdk.Context, id sdk.Address) (ptypes.Provider, bool) + GetRegistrationTime(ctx sdk.Context, id sdk.Address) (time.Time, bool) +} + +// MarketKeeper -- from x/market +type MarketKeeper interface { + GetProviderLeaseStats(ctx sdk.Context, provider sdk.Address, since time.Time) (LeaseStats, error) +} + +// LeaseStats aggregates lease outcomes for a provider +type LeaseStats struct { + TotalLeases uint64 // all leases in the window + CompletedByTenant uint64 // leases closed by tenant or ran to completion + TerminatedByProvider uint64 // leases closed by provider prematurely +} +``` + +### 3.2 Tier Comparison Convention + +> **WARNING: Inverted numeric ordering**. The `VerificationTier` enum uses lower numeric values for higher trust: +> `TierTrusted=1` (L0, highest trust) through `TierIdentified=4` (L3, lowest trust). This means `tier <= threshold` +> checks "is this tier at least as good as the threshold", which reads counter-intuitively. All tier comparisons +> MUST use the helper functions below to prevent off-by-one errors. + +```go +// TierAtLeast returns true if 'have' is at least as trusted as 'need'. +// Both must be valid tiers (not TierUnspecified). +// Example: TierAtLeast(TierVerified, TierEstablished) returns false (L2 < L1) +// Example: TierAtLeast(TierTrusted, TierVerified) returns true (L0 >= L2) +func TierAtLeast(have, need VerificationTier) bool { + return have != TierUnspecified && have <= need +} + +// TierBetter returns true if 'a' is strictly more trusted than 'b'. +func TierBetter(a, b VerificationTier) bool { + return a != TierUnspecified && a < b +} +``` + +### 3.3 VerificationKeeper Interface (consumed by x/market) + +```go +// VerificationKeeper -- from x/verification, used by x/market for bid filtering +type VerificationKeeper interface { + // GetProviderValidAttestations returns all Valid, non-expired attestations for a provider. + // Does NOT filter by snapshot compliance -- caller must check separately. + GetProviderValidAttestations(ctx sdk.Context, provider sdk.Address) ([]AttestationRecord, bool) + + // IsProviderSnapshotCompliant returns true if the provider is NOT snapshot-suspended. + // Returns true if the provider has no snapshot record (L4 providers don't need one). + IsProviderSnapshotCompliant(ctx sdk.Context, provider sdk.Address) bool + + // GetProviderBestTier returns the best (lowest numeric enum value = highest trust) tier + // from valid attestations. Returns TierUnspecified if no valid attestations exist. + GetProviderBestTier(ctx sdk.Context, provider sdk.Address) VerificationTier + + // ProviderHasCapability returns true if any valid attestation for this provider + // includes the given capability flag. + ProviderHasCapability(ctx sdk.Context, provider sdk.Address, cap CapabilityFlag) bool +} +``` + +### 3.4 Market Module Bid Filtering (x/market/handler/server.go) + +Replaces the current audit-based matching in `CreateBid`: + +```go +// CURRENT (to be replaced): +// provAttr, _ := ms.keepers.Audit.GetProviderAttributes(ctx, provider) +// provAttr = append([]atypes.AuditedProvider{{...}}, provAttr...) +// if !order.MatchRequirements(provAttr) { return nil, mv1.ErrAttributeMismatch } + +// NEW: +verReq := order.VerificationRequirement() + +if verReq.MinTier != TierUnspecified { + bestTier := ms.keepers.Verification.GetProviderBestTier(ctx, provider) + if !TierAtLeast(bestTier, verReq.MinTier) { + return nil, mv1.ErrInsufficientVerificationTier + } + + // L2+ providers must be snapshot-compliant for bid eligibility + if TierAtLeast(bestTier, TierVerified) { + if !ms.keepers.Verification.IsProviderSnapshotCompliant(ctx, provider) { + return nil, mv1.ErrProviderSnapshotSuspended + } + } +} + +for _, requiredCap := range verReq.RequiredCapabilities { + if !ms.keepers.Verification.ProviderHasCapability(ctx, provider, requiredCap) { + return nil, mv1.ErrMissingCapability + } +} + +if len(verReq.RequiredAuditors) > 0 { + attestations, _ := ms.keepers.Verification.GetProviderValidAttestations(ctx, provider) + if !hasAttestationFromAny(attestations, verReq.RequiredAuditors) { + return nil, mv1.ErrRequiredAuditorNotFound + } +} +``` + +### 3.5 VerificationRequirement (x/deployment proto addition) + +The `PlacementRequirements` in the deployment module gains a new field: + +```protobuf +// In deployment proto -- PlacementRequirements +message PlacementRequirements { + // ... existing attribute fields ... + + // Verification requirements for providers bidding on this group + VerificationRequirement verification = N; // new field, nullable (omit = no requirement) +} + +// Defined in akash.verification.v1 and imported by deployment +message VerificationRequirement { + VerificationTier min_tier = 1; // 0 = no requirement + repeated CapabilityFlag required_capabilities = 2; + repeated string required_auditors = 3; // specific auditor addresses (optional) +} +``` + +--- + +## 4. Protobuf Definitions + +Complete proto file specifications for `akash.verification.v1`. All files are placed under +`proto/node/akash/verification/v1/`. + +### 4.1 types.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +// VerificationTier represents provider verification levels. +// Lower numeric value = higher trust. L4 (Permissionless) has no attestation. +enum VerificationTier { + option (gogoproto.goproto_enum_prefix) = false; + + verification_tier_unspecified = 0 + [(gogoproto.enumvalue_customname) = "TierUnspecified"]; + verification_tier_trusted = 1 + [(gogoproto.enumvalue_customname) = "TierTrusted"]; // L0 + verification_tier_established = 2 + [(gogoproto.enumvalue_customname) = "TierEstablished"]; // L1 + verification_tier_verified = 3 + [(gogoproto.enumvalue_customname) = "TierVerified"]; // L2 + verification_tier_identified = 4 + [(gogoproto.enumvalue_customname) = "TierIdentified"]; // L3 +} + +enum AuditorStatus { + option (gogoproto.goproto_enum_prefix) = false; + + auditor_status_unspecified = 0 + [(gogoproto.enumvalue_customname) = "AuditorStatusUnspecified"]; + auditor_status_active = 1 + [(gogoproto.enumvalue_customname) = "AuditorStatusActive"]; + auditor_status_frozen = 2 + [(gogoproto.enumvalue_customname) = "AuditorStatusFrozen"]; + auditor_status_lapsed = 3 + [(gogoproto.enumvalue_customname) = "AuditorStatusLapsed"]; + auditor_status_resigned = 4 + [(gogoproto.enumvalue_customname) = "AuditorStatusResigned"]; + auditor_status_removed = 5 + [(gogoproto.enumvalue_customname) = "AuditorStatusRemoved"]; +} + +enum BondStatus { + option (gogoproto.goproto_enum_prefix) = false; + + bond_status_unspecified = 0 + [(gogoproto.enumvalue_customname) = "BondStatusUnspecified"]; + bond_status_bonded = 1 + [(gogoproto.enumvalue_customname) = "BondStatusBonded"]; + bond_status_frozen = 2 + [(gogoproto.enumvalue_customname) = "BondStatusFrozen"]; + bond_status_unbonding = 3 + [(gogoproto.enumvalue_customname) = "BondStatusUnbonding"]; +} + +enum AttestationStatus { + option (gogoproto.goproto_enum_prefix) = false; + + attestation_status_unspecified = 0 + [(gogoproto.enumvalue_customname) = "AttestationStatusUnspecified"]; + attestation_status_valid = 1 + [(gogoproto.enumvalue_customname) = "AttestationStatusValid"]; + attestation_status_voided = 2 + [(gogoproto.enumvalue_customname) = "AttestationStatusVoided"]; + attestation_status_expired = 3 + [(gogoproto.enumvalue_customname) = "AttestationStatusExpired"]; + attestation_status_revoked = 4 + [(gogoproto.enumvalue_customname) = "AttestationStatusRevoked"]; + attestation_status_removed = 5 + [(gogoproto.enumvalue_customname) = "AttestationStatusRemoved"]; +} + +enum VoidedReason { + option (gogoproto.goproto_enum_prefix) = false; + + voided_reason_unspecified = 0 + [(gogoproto.enumvalue_customname) = "VoidedReasonUnspecified"]; + voided_reason_discrepancy = 1 + [(gogoproto.enumvalue_customname) = "VoidedReasonDiscrepancy"]; + voided_reason_governance = 2 + [(gogoproto.enumvalue_customname) = "VoidedReasonGovernance"]; + voided_reason_bond_withdrawn = 3 + [(gogoproto.enumvalue_customname) = "VoidedReasonBondWithdrawn"]; + voided_reason_bond_slashed = 4 + [(gogoproto.enumvalue_customname) = "VoidedReasonBondSlashed"]; +} + +enum FeeStatus { + option (gogoproto.goproto_enum_prefix) = false; + + fee_status_unspecified = 0 + [(gogoproto.enumvalue_customname) = "FeeStatusUnspecified"]; + fee_status_escrowed = 1 + [(gogoproto.enumvalue_customname) = "FeeStatusEscrowed"]; + fee_status_released_to_auditor = 2 + [(gogoproto.enumvalue_customname) = "FeeStatusReleasedToAuditor"]; + fee_status_returned_to_provider = 3 + [(gogoproto.enumvalue_customname) = "FeeStatusReturnedToProvider"]; +} + +enum DiscrepancyStatus { + option (gogoproto.goproto_enum_prefix) = false; + + discrepancy_status_unspecified = 0 + [(gogoproto.enumvalue_customname) = "DiscrepancyStatusUnspecified"]; + discrepancy_status_pending = 1 + [(gogoproto.enumvalue_customname) = "DiscrepancyStatusPending"]; + discrepancy_status_resolved = 2 + [(gogoproto.enumvalue_customname) = "DiscrepancyStatusResolved"]; + discrepancy_status_timed_out = 3 + [(gogoproto.enumvalue_customname) = "DiscrepancyStatusTimedOut"]; +} + +enum CapabilityFlag { + option (gogoproto.goproto_enum_prefix) = false; + + capability_unspecified = 0 + [(gogoproto.enumvalue_customname) = "CapabilityUnspecified"]; + capability_tee_hardware_attestation = 1 + [(gogoproto.enumvalue_customname) = "CapabilityTEEHardwareAttestation"]; + capability_confidential_computing = 2 + [(gogoproto.enumvalue_customname) = "CapabilityConfidentialComputing"]; + capability_persistent_storage = 3 + [(gogoproto.enumvalue_customname) = "CapabilityPersistentStorage"]; + capability_bare_metal = 4 + [(gogoproto.enumvalue_customname) = "CapabilityBareMetal"]; +} +``` + +### 4.2 state.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; +import "cosmos_proto/cosmos.proto"; +import "cosmos/base/v1beta1/coin.proto"; +import "google/protobuf/timestamp.proto"; + +import "akash/verification/v1/types.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +message AuditorRecord { + string address = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + AuditorStatus status = 2; + VerificationTier max_attestation_tier = 3; + cosmos.base.v1beta1.Coin bond_amount = 4 [(gogoproto.nullable) = false]; + BondStatus bond_status = 5; + bytes metadata_hash = 6; + google.protobuf.Timestamp registered_at = 7 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + google.protobuf.Timestamp renewal_deadline = 8 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + uint64 discrepancy_count = 9; +} + +message AttestationRecord { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier tier = 3; + repeated CapabilityFlag capabilities = 4; + bytes evidence_hash = 5; + cosmos.base.v1beta1.Coin fee = 6 [(gogoproto.nullable) = false]; + FeeStatus fee_status = 7; + google.protobuf.Timestamp created_at = 8 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + google.protobuf.Timestamp expires_at = 9 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + AttestationStatus status = 10; + VoidedReason voided_reason = 11; + cosmos.base.v1beta1.Coin deposit = 12 [(gogoproto.nullable) = false]; // anti-griefing deposit +} + +message DiscrepancyEvent { + uint64 id = 1 [(gogoproto.customname) = "ID"]; + string provider = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor_a = 3 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier auditor_a_tier = 4; + string auditor_b = 5 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier auditor_b_tier = 6; + google.protobuf.Timestamp timestamp = 7 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + DiscrepancyStatus resolution_status = 8; + uint64 resolution_proposal_id = 9; +} + +message ProviderBondRecord { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin bonded_amount = 2 [(gogoproto.nullable) = false]; + repeated UnbondingEntry unbonding_entries = 3 [(gogoproto.nullable) = false]; + bool slashed = 4; + google.protobuf.Timestamp last_slash_time = 5 + [(gogoproto.nullable) = true, (gogoproto.stdtime) = true]; +} + +message UnbondingEntry { + cosmos.base.v1beta1.Coin amount = 1 [(gogoproto.nullable) = false]; + google.protobuf.Timestamp completion_time = 2 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; +} + +message ResourceSummary { + uint32 total_gpus = 1; + uint32 total_vcpus = 2; + uint64 total_memory_mb = 3; + uint64 total_storage_mb = 4; + uint32 active_leases = 5; + string software_version = 6; + bytes software_signature = 7; +} + +message ProviderSnapshotRecord { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + bytes snapshot_hash = 2; + ResourceSummary resource_summary = 3 [(gogoproto.nullable) = false]; + google.protobuf.Timestamp posted_at = 4 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + google.protobuf.Timestamp snapshot_timestamp = 5 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + google.protobuf.Timestamp compliance_deadline = 6 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + bool suspended = 7; +} +``` + +### 4.3 params.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; +import "amino/amino.proto"; +import "cosmos/base/v1beta1/coin.proto"; +import "google/protobuf/duration.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +message Params { + cosmos.base.v1beta1.Coin bond_l3 = 1 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_l2 = 2 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_l1 = 3 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_l0 = 4 [(gogoproto.nullable) = false]; + + google.protobuf.Duration ttl_l3 = 5 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration ttl_l2 = 6 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration ttl_l1 = 7 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration ttl_l0 = 8 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + + cosmos.base.v1beta1.Coin min_fee_l3 = 9 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin min_fee_l2 = 10 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin min_fee_l1 = 11 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin min_fee_l0 = 12 [(gogoproto.nullable) = false]; + + uint32 discrepancy_threshold = 13; + + google.protobuf.Duration auditor_unbonding_period = 14 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + + google.protobuf.Duration renewal_period_l3 = 15 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration renewal_period_l2 = 16 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration renewal_period_l1 = 17 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration renewal_period_l0 = 18 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + + google.protobuf.Duration snapshot_hash_interval = 19 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration max_snapshot_age = 20 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + + cosmos.base.v1beta1.Coin bond_gpu_l2 = 21 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_gpu_l1 = 22 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_gpu_l0 = 23 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_vcpu_l2 = 24 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_vcpu_l1 = 25 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_vcpu_l0 = 26 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_mem_gb_l2 = 27 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_mem_gb_l1 = 28 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_mem_gb_l0 = 29 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_storage_tb_l2 = 30 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_storage_tb_l1 = 31 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin bond_storage_tb_l0 = 32 [(gogoproto.nullable) = false]; + + google.protobuf.Duration provider_bond_unbonding_period = 33 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + + google.protobuf.Duration min_age_l2 = 34 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration min_age_l1 = 35 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration min_age_l0 = 36 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + uint32 min_lease_completion_bps_l1 = 37; + uint32 min_lease_completion_bps_l0 = 38; + google.protobuf.Duration clean_history_window_l1 = 39 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration clean_history_window_l0 = 40 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration min_l1_duration_for_l0 = 41 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + uint32 min_leases_for_completion_rate = 42; + + uint32 max_endblocker_attestation_expiries = 43; + uint32 max_endblocker_snapshot_suspensions = 44; + uint32 max_endblocker_unbonding_completions = 45; + uint32 max_endblocker_discrepancy_timeouts = 46; + + google.protobuf.Duration discrepancy_resolution_timeout = 47 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + cosmos.base.v1beta1.Coin attestation_deposit = 48 [(gogoproto.nullable) = false]; + + bool verification_module_active = 49; + + google.protobuf.Duration contact_response_critical_l3 = 50 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_critical_l2 = 51 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_critical_l1 = 52 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_critical_l0 = 53 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_standard_l3 = 54 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_standard_l2 = 55 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_standard_l1 = 56 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; + google.protobuf.Duration contact_response_standard_l0 = 57 + [(gogoproto.nullable) = false, (gogoproto.stdduration) = true]; +} +``` + +### 4.4 msg.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; +import "cosmos_proto/cosmos.proto"; +import "cosmos/msg/v1/msg.proto"; +import "cosmos/base/v1beta1/coin.proto"; +import "amino/amino.proto"; +import "google/protobuf/timestamp.proto"; + +import "akash/verification/v1/types.proto"; +import "akash/verification/v1/state.proto"; +import "akash/verification/v1/params.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +message MsgPostAuditorBond { + option (cosmos.msg.v1.signer) = "auditor"; + option (amino.name) = "akash/verification/v1/MsgPostAuditorBond"; + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} +message MsgPostAuditorBondResponse {} + +message MsgSubmitAttestation { + option (cosmos.msg.v1.signer) = "auditor"; + option (amino.name) = "akash/verification/v1/MsgSubmitAttestation"; + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier tier = 3; + repeated CapabilityFlag capabilities = 4; + bytes evidence_hash = 5; + cosmos.base.v1beta1.Coin fee = 6 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin deposit = 7 [(gogoproto.nullable) = false]; // anti-griefing deposit +} +message MsgSubmitAttestationResponse {} + +message MsgRevokeAttestation { + option (cosmos.msg.v1.signer) = "auditor"; + option (amino.name) = "akash/verification/v1/MsgRevokeAttestation"; + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message MsgRevokeAttestationResponse {} + +message MsgRemoveAttestation { + option (cosmos.msg.v1.signer) = "provider"; + option (amino.name) = "akash/verification/v1/MsgRemoveAttestation"; + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message MsgRemoveAttestationResponse {} + +message MsgResignAuditor { + option (cosmos.msg.v1.signer) = "auditor"; + option (amino.name) = "akash/verification/v1/MsgResignAuditor"; + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message MsgResignAuditorResponse {} + +message MsgPostProviderBond { + option (cosmos.msg.v1.signer) = "provider"; + option (amino.name) = "akash/verification/v1/MsgPostProviderBond"; + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} +message MsgPostProviderBondResponse {} + +message MsgWithdrawProviderBond { + option (cosmos.msg.v1.signer) = "provider"; + option (amino.name) = "akash/verification/v1/MsgWithdrawProviderBond"; + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} +message MsgWithdrawProviderBondResponse {} + +message MsgPostSnapshotHash { + option (cosmos.msg.v1.signer) = "provider"; + option (amino.name) = "akash/verification/v1/MsgPostSnapshotHash"; + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + bytes snapshot_hash = 2; + ResourceSummary resource_summary = 3 [(gogoproto.nullable) = false]; + google.protobuf.Timestamp snapshot_timestamp = 4 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; +} +message MsgPostSnapshotHashResponse {} + +message MsgRegisterAuditor { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgRegisterAuditor"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier max_attestation_tier = 3; + bytes metadata_hash = 4; + cosmos.base.v1beta1.Coin required_bond = 5 [(gogoproto.nullable) = false]; +} +message MsgRegisterAuditorResponse {} + +message MsgRenewAuditor { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgRenewAuditor"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message MsgRenewAuditorResponse {} + +message MsgRemoveAuditor { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgRemoveAuditor"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message MsgRemoveAuditorResponse {} + +message MsgRevokeProviderAttestation { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgRevokeProviderAttestation"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string provider = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 3 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string reason = 4; +} +message MsgRevokeProviderAttestationResponse {} + +message MsgRevokeAllProviderAttestations { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgRevokeAllProviderAttestations"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string provider = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string reason = 3; +} +message MsgRevokeAllProviderAttestationsResponse {} + +message MsgRevokeAuditorAttestations { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgRevokeAuditorAttestations"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string reason = 3; +} +message MsgRevokeAuditorAttestationsResponse {} + +message MsgResolveDiscrepancy { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgResolveDiscrepancy"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + uint64 discrepancy_id = 2; + string vindicated_auditor = 3 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + bool slash_auditor_a = 4; + bool slash_auditor_b = 5; + string reason = 6; +} +message MsgResolveDiscrepancyResponse {} + +message MsgSlashProviderBond { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgSlashProviderBond"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string provider = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string slash_fraction = 3 [ + (cosmos_proto.scalar) = "cosmos.Dec", + (gogoproto.customtype) = "cosmossdk.io/math.LegacyDec", + (gogoproto.nullable) = false + ]; + string reason = 4; +} +message MsgSlashProviderBondResponse {} + +message MsgUpdateParams { + option (cosmos.msg.v1.signer) = "authority"; + option (amino.name) = "akash/verification/v1/MsgUpdateParams"; + string authority = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + Params params = 2 [(gogoproto.nullable) = false]; +} +message MsgUpdateParamsResponse {} +``` + +### 4.5 service.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "cosmos/msg/v1/msg.proto"; +import "akash/verification/v1/msg.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +service Msg { + option (cosmos.msg.v1.service) = true; + + rpc PostAuditorBond(MsgPostAuditorBond) returns (MsgPostAuditorBondResponse); + rpc SubmitAttestation(MsgSubmitAttestation) returns (MsgSubmitAttestationResponse); + rpc RevokeAttestation(MsgRevokeAttestation) returns (MsgRevokeAttestationResponse); + rpc RemoveAttestation(MsgRemoveAttestation) returns (MsgRemoveAttestationResponse); + rpc ResignAuditor(MsgResignAuditor) returns (MsgResignAuditorResponse); + rpc PostProviderBond(MsgPostProviderBond) returns (MsgPostProviderBondResponse); + rpc WithdrawProviderBond(MsgWithdrawProviderBond) returns (MsgWithdrawProviderBondResponse); + rpc PostSnapshotHash(MsgPostSnapshotHash) returns (MsgPostSnapshotHashResponse); + rpc RegisterAuditor(MsgRegisterAuditor) returns (MsgRegisterAuditorResponse); + rpc RenewAuditor(MsgRenewAuditor) returns (MsgRenewAuditorResponse); + rpc RemoveAuditor(MsgRemoveAuditor) returns (MsgRemoveAuditorResponse); + rpc RevokeProviderAttestation(MsgRevokeProviderAttestation) returns (MsgRevokeProviderAttestationResponse); + rpc RevokeAllProviderAttestations(MsgRevokeAllProviderAttestations) returns (MsgRevokeAllProviderAttestationsResponse); + rpc RevokeAuditorAttestations(MsgRevokeAuditorAttestations) returns (MsgRevokeAuditorAttestationsResponse); + rpc ResolveDiscrepancy(MsgResolveDiscrepancy) returns (MsgResolveDiscrepancyResponse); + rpc SlashProviderBond(MsgSlashProviderBond) returns (MsgSlashProviderBondResponse); + rpc UpdateParams(MsgUpdateParams) returns (MsgUpdateParamsResponse); +} +``` + +### 4.6 query.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; +import "cosmos_proto/cosmos.proto"; +import "cosmos/base/v1beta1/coin.proto"; +import "cosmos/base/query/v1beta1/pagination.proto"; +import "google/api/annotations.proto"; + +import "akash/verification/v1/types.proto"; +import "akash/verification/v1/state.proto"; +import "akash/verification/v1/params.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +service Query { + rpc Auditor(QueryAuditorRequest) returns (QueryAuditorResponse) { + option (google.api.http).get = "/akash/verification/v1/auditors/{auditor}"; + } + rpc Auditors(QueryAuditorsRequest) returns (QueryAuditorsResponse) { + option (google.api.http).get = "/akash/verification/v1/auditors"; + } + rpc Attestation(QueryAttestationRequest) returns (QueryAttestationResponse) { + option (google.api.http).get = "/akash/verification/v1/attestations/{provider}/{auditor}"; + } + rpc ProviderAttestations(QueryProviderAttestationsRequest) returns (QueryProviderAttestationsResponse) { + option (google.api.http).get = "/akash/verification/v1/providers/{provider}/attestations"; + } + rpc AuditorAttestations(QueryAuditorAttestationsRequest) returns (QueryAuditorAttestationsResponse) { + option (google.api.http).get = "/akash/verification/v1/auditors/{auditor}/attestations"; + } + rpc Discrepancy(QueryDiscrepancyRequest) returns (QueryDiscrepancyResponse) { + option (google.api.http).get = "/akash/verification/v1/discrepancies/{id}"; + } + rpc Discrepancies(QueryDiscrepanciesRequest) returns (QueryDiscrepanciesResponse) { + option (google.api.http).get = "/akash/verification/v1/discrepancies"; + } + rpc ProviderBond(QueryProviderBondRequest) returns (QueryProviderBondResponse) { + option (google.api.http).get = "/akash/verification/v1/providers/{provider}/bond"; + } + rpc ProviderSnapshot(QueryProviderSnapshotRequest) returns (QueryProviderSnapshotResponse) { + option (google.api.http).get = "/akash/verification/v1/providers/{provider}/snapshot"; + } + rpc Params(QueryParamsRequest) returns (QueryParamsResponse) { + option (google.api.http).get = "/akash/verification/v1/params"; + } +} + +message QueryAuditorRequest { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message QueryAuditorResponse { + AuditorRecord auditor = 1 [(gogoproto.nullable) = false]; +} + +message QueryAuditorsRequest { + AuditorStatus status_filter = 1; + cosmos.base.query.v1beta1.PageRequest pagination = 2; +} +message QueryAuditorsResponse { + repeated AuditorRecord auditors = 1 [(gogoproto.nullable) = false]; + cosmos.base.query.v1beta1.PageResponse pagination = 2; +} + +message QueryAttestationRequest { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message QueryAttestationResponse { + AttestationRecord attestation = 1 [(gogoproto.nullable) = false]; +} + +message QueryProviderAttestationsRequest { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + AttestationStatus status_filter = 2; + cosmos.base.query.v1beta1.PageRequest pagination = 3; +} +message QueryProviderAttestationsResponse { + repeated AttestationRecord attestations = 1 [(gogoproto.nullable) = false]; + cosmos.base.query.v1beta1.PageResponse pagination = 2; +} + +message QueryAuditorAttestationsRequest { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.query.v1beta1.PageRequest pagination = 2; +} +message QueryAuditorAttestationsResponse { + repeated AttestationRecord attestations = 1 [(gogoproto.nullable) = false]; + cosmos.base.query.v1beta1.PageResponse pagination = 2; +} + +message QueryDiscrepancyRequest { + uint64 id = 1; +} +message QueryDiscrepancyResponse { + DiscrepancyEvent discrepancy = 1 [(gogoproto.nullable) = false]; +} + +message QueryDiscrepanciesRequest { + DiscrepancyStatus status_filter = 1; + cosmos.base.query.v1beta1.PageRequest pagination = 2; +} +message QueryDiscrepanciesResponse { + repeated DiscrepancyEvent discrepancies = 1 [(gogoproto.nullable) = false]; + cosmos.base.query.v1beta1.PageResponse pagination = 2; +} + +message QueryProviderBondRequest { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message QueryProviderBondResponse { + ProviderBondRecord bond = 1 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin required_for_current_tier = 2 [(gogoproto.nullable) = false]; +} + +message QueryProviderSnapshotRequest { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} +message QueryProviderSnapshotResponse { + ProviderSnapshotRecord snapshot = 1 [(gogoproto.nullable) = false]; +} + +message QueryParamsRequest {} +message QueryParamsResponse { + Params params = 1 [(gogoproto.nullable) = false]; +} +``` + +### 4.7 events.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; +import "cosmos_proto/cosmos.proto"; +import "cosmos/base/v1beta1/coin.proto"; +import "google/protobuf/timestamp.proto"; + +import "akash/verification/v1/types.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +message EventAuditorRegistered { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier max_attestation_tier = 2; +} + +message EventAuditorBondPosted { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} + +message EventAuditorFrozen { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + uint64 discrepancy_id = 2; +} + +message EventAuditorLapsed { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} + +message EventAuditorResigned { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} + +message EventAuditorRemoved { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} + +message EventAuditorRenewed { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + google.protobuf.Timestamp new_deadline = 2 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; +} + +message EventAttestationSubmitted { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier tier = 3; + repeated CapabilityFlag capabilities = 4; + google.protobuf.Timestamp expires_at = 5 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; +} + +message EventAttestationExpired { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier tier = 3; +} + +message EventAttestationRevoked { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string initiator = 3; +} + +message EventAttestationVoided { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VoidedReason reason = 3; +} + +message EventDiscrepancyDetected { + uint64 discrepancy_id = 1; + string provider = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor_a = 3 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier tier_a = 4; + string auditor_b = 5 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + VerificationTier tier_b = 6; +} + +message EventDiscrepancyResolved { + uint64 discrepancy_id = 1; + string vindicated_auditor = 2; +} + +message EventDiscrepancyTimedOut { + uint64 discrepancy_id = 1; + string auditor_a = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor_b = 3 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} + +message EventProviderBondPosted { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; + cosmos.base.v1beta1.Coin total_bonded = 3 [(gogoproto.nullable) = false]; +} + +message EventProviderBondSlashed { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin slashed_amount = 2 [(gogoproto.nullable) = false]; + string reason = 3; +} + +message EventProviderBondWithdrawalInitiated { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; + google.protobuf.Timestamp completion_time = 3 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; +} + +message EventProviderBondWithdrawalCompleted { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} + +message EventSnapshotHashPosted { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + bytes snapshot_hash = 2; + google.protobuf.Timestamp compliance_deadline = 3 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; +} + +message EventSnapshotSuspended { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} + +message EventSnapshotResumed { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; +} + +message EventFeeEscrowed { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + string auditor = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 3 [(gogoproto.nullable) = false]; +} + +message EventFeeReleasedToAuditor { + string auditor = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} + +message EventFeeReturnedToProvider { + string provider = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"]; + cosmos.base.v1beta1.Coin amount = 2 [(gogoproto.nullable) = false]; +} +``` + +### 4.8 genesis.proto + +```protobuf +syntax = "proto3"; +package akash.verification.v1; + +import "gogoproto/gogo.proto"; + +import "akash/verification/v1/state.proto"; +import "akash/verification/v1/params.proto"; + +option go_package = "pkg.akt.dev/go/node/verification/v1"; + +message GenesisState { + Params params = 1 [(gogoproto.nullable) = false]; + repeated AuditorRecord auditors = 2 [(gogoproto.nullable) = false]; + repeated AttestationRecord attestations = 3 [(gogoproto.nullable) = false]; + repeated DiscrepancyEvent discrepancies = 4 [(gogoproto.nullable) = false]; + repeated ProviderBondRecord provider_bonds = 5 [(gogoproto.nullable) = false]; + repeated ProviderSnapshotRecord provider_snapshots = 6 [(gogoproto.nullable) = false]; + uint64 next_discrepancy_id = 7; +} +``` + +--- + +## Appendix A: Error Codes + +| Error | Code | gRPC Status | Description | +|--------------------------------------|------|--------------------------|------------------------------------------------------------| +| `ErrAuditorNotFound` | 1 | `NOT_FOUND` | Auditor address not in registered set | +| `ErrAuditorNotActive` | 2 | `FAILED_PRECONDITION` | Auditor status is not Active | +| `ErrAuditorFrozen` | 3 | `FAILED_PRECONDITION` | Auditor bond is frozen (pending discrepancy resolution) | +| `ErrAuditorUnauthorizedTier` | 4 | `PERMISSION_DENIED` | Attested tier exceeds auditor's max attestation authority | +| `ErrSelfAttestation` | 5 | `INVALID_ARGUMENT` | Auditor and provider addresses are the same | +| `ErrInsufficientAuditFee` | 6 | `INVALID_ARGUMENT` | Fee below governance minimum for this tier | +| `ErrProviderNotRegistered` | 7 | `NOT_FOUND` | Provider not found in x/provider | +| `ErrInsufficientProviderBond` | 8 | `FAILED_PRECONDITION` | Provider bond below required minimum for attested tier | +| `ErrSnapshotNonCompliant` | 9 | `FAILED_PRECONDITION` | Provider snapshot is suspended or missing | +| `ErrInsufficientProviderAge` | 10 | `FAILED_PRECONDITION` | Provider registration too recent for attested tier | +| `ErrInsufficientLeaseCompletionRate` | 11 | `FAILED_PRECONDITION` | Lease completion rate below threshold | +| `ErrSlashingHistoryViolation` | 12 | `FAILED_PRECONDITION` | Provider has slashing events within lookback window | +| `ErrInsufficientL1History` | 13 | `FAILED_PRECONDITION` | Provider lacks required continuous L1+ attestation for L0 | +| `ErrAttestationNotFound` | 14 | `NOT_FOUND` | Attestation (provider, auditor) not found | +| `ErrDiscrepancyNotFound` | 15 | `NOT_FOUND` | Discrepancy ID not found | +| `ErrBondWithdrawalExceedsMinimum` | 16 | `FAILED_PRECONDITION` | Withdrawal would leave bond below minimum for active tiers | +| `ErrSnapshotTooOld` | 17 | `INVALID_ARGUMENT` | Snapshot timestamp exceeds max_snapshot_age | +| `ErrInsufficientAuditorBond` | 18 | `FAILED_PRECONDITION` | Auditor bond below required amount | +| `ErrInsufficientVerificationTier` | 19 | `FAILED_PRECONDITION` | Provider's tier does not meet deployment requirement | +| `ErrProviderSnapshotSuspended` | 20 | `FAILED_PRECONDITION` | Provider snapshot suspended, cannot bid on L2+ orders | +| `ErrMissingCapability` | 21 | `FAILED_PRECONDITION` | Provider lacks required capability flag | +| `ErrRequiredAuditorNotFound` | 22 | `NOT_FOUND` | No attestation from a required auditor | +| `ErrInsufficientDeposit` | 23 | `INVALID_ARGUMENT` | Attestation deposit below governance minimum | + +--- + +## Appendix B: Module Dependencies + +``` +x/verification + reads from: + x/provider -- provider registration, registration timestamp + x/market -- lease completion stats (provider lease history) + x/bank -- token transfers (bond deposits, fee escrow, withdrawals) + x/gov -- authority validation for governance messages + + read by: + x/market -- bid filtering (VerificationKeeper interface) + x/incentive -- incentive eligibility (AEP-53, future) +``` + +``` +Module initialization order: x/provider -> x/market -> x/verification +(x/verification depends on x/provider and x/market keepers) +``` + +--- + +## 5. Inventory Service Extensions + +The Inventory Service already exists in the provider codebase. This specification requires the following extensions: + +### 5.1 Required Extensions + +- **Multi-source data collection**: For each hardware class (CPU, GPU, memory, storage, network), report + properties from multiple independent data sources as described in + [Multi-Source Data Collection](./README.md#multi-source-data-collection). Flag discrepancies between sources. +- **Challenge-response support**: Accept a 32-byte cryptographically random nonce in the query request. + Include the nonce in the signed response payload. +- **Response signing**: Every response must be signed by the provider's on-chain key. The signature covers + the entire payload including nonce, timestamp, and all hardware data. +- **Virtualization detection**: Implement the detection methods described in + [Virtualization Detection](./README.md#virtualization-detection) and include the result in every snapshot. +- **Snapshot hash generation**: Compute SHA-256 of the full snapshot payload for posting via `MsgPostSnapshotHash`. +- **`ResourceSummary` generation**: Extract the on-chain summary fields (`total_gpus`, `total_vcpus`, + `total_memory_mb`, `total_storage_mb`, `active_leases`, `software_version`, `software_signature`) from + the full snapshot for inclusion in `MsgPostSnapshotHash`. + +### 5.2 Vendor Library Handling + +The binary is statically linked for all system libraries. For vendor hardware management libraries (NVIDIA NVML, +AMD ROCm SMI), the binary may dynamically load them at runtime: + +1. Verify the vendor library's cryptographic signature before loading +2. Treat vendor library output as one data source among multiple +3. Cross-validate against direct hardware reads (PCIe configuration space, CPUID) +4. Flag discrepancies in the snapshot + +### 5.3 gRPC Service + +The Inventory Service is exposed as a gRPC service on the existing provider daemon endpoint: + +```protobuf +service InventoryService { + rpc GetInventorySnapshot(GetInventorySnapshotRequest) returns (GetInventorySnapshotResponse); +} + +message GetInventorySnapshotRequest { + bytes nonce = 1; // 32 bytes, cryptographically random. Optional. +} + +message GetInventorySnapshotResponse { + bytes snapshot_payload = 1; // full machine-generated snapshot + bytes signature = 2; // provider on-chain key signature over snapshot_payload + string provider = 3; // provider address for key lookup +} +``` + +The snapshot payload format is implementation-defined but must include all fields described in +[Snapshot Contents](./README.md#snapshot-contents). + +--- + +## 6. Module Registration and Genesis + +### 6.1 Module Registration + +The `x/verification` module is registered in `app.go`: + +- Add `x/verification` to the module manager and begin/end blocker registration +- Register the module's store key +- Create the module account for escrow (bond and fee escrow) +- Wire keeper dependencies: `ProviderKeeper`, `MarketKeeper`, `BankKeeper`, governance `authority` + +### 6.2 InitGenesis + +```go +func (k Keeper) InitGenesis(ctx sdk.Context, state GenesisState) { + // 1. Validate and set params + k.SetParams(ctx, state.Params) + + // 2. Create module account for escrow (bonds + fees) + // The module account must exist before any token transfers + k.accountKeeper.GetModuleAccount(ctx, ModuleName) + + // 3. Import auditor records + for _, auditor := range state.Auditors { + k.SetAuditorRecord(ctx, auditor) + // Re-create renewal deadline queue entries + k.insertRenewalQueue(ctx, auditor.Address, auditor.RenewalDeadline) + } + + // 4. Import attestation records + for _, att := range state.Attestations { + k.SetAttestationRecord(ctx, att) + // Re-create secondary index and expiry queue for Valid attestations + if att.Status == AttestationStatusValid { + k.setAuditorAttestationIndex(ctx, att.Auditor, att.Provider) + k.insertExpiryQueue(ctx, att.Provider, att.Auditor, att.ExpiresAt) + } + } + + // 5. Import discrepancy events + for _, disc := range state.Discrepancies { + k.SetDiscrepancyEvent(ctx, disc) + // Re-create timeout queue for Pending discrepancies + if disc.ResolutionStatus == DiscrepancyStatusPending { + timeout := disc.Timestamp.Add(k.GetParams(ctx).DiscrepancyResolutionTimeout) + k.insertDiscrepancyTimeoutQueue(ctx, disc.ID, timeout) + } + } + + // 6. Import provider bonds and snapshots + for _, bond := range state.ProviderBonds { + k.SetProviderBondRecord(ctx, bond) + // Re-create unbonding queue entries + for _, entry := range bond.UnbondingEntries { + k.insertProviderUnbondingQueue(ctx, bond.Provider, entry.CompletionTime) + } + } + for _, snap := range state.ProviderSnapshots { + k.SetProviderSnapshotRecord(ctx, snap) + // Re-create snapshot compliance queue if not suspended + if !snap.Suspended { + k.insertSnapshotComplianceQueue(ctx, snap.Provider, snap.ComplianceDeadline) + } + } + + // 7. Set next discrepancy ID + k.SetNextDiscrepancyID(ctx, state.NextDiscrepancyId) +} +``` + +### 6.3 ExportGenesis + +```go +func (k Keeper) ExportGenesis(ctx sdk.Context) GenesisState { + return GenesisState{ + Params: k.GetParams(ctx), + Auditors: k.GetAllAuditorRecords(ctx), + Attestations: k.GetAllAttestationRecords(ctx), + Discrepancies: k.GetAllDiscrepancyEvents(ctx), + ProviderBonds: k.GetAllProviderBondRecords(ctx), + ProviderSnapshots: k.GetAllProviderSnapshotRecords(ctx), + NextDiscrepancyId: k.GetNextDiscrepancyID(ctx), + } +} +``` + +--- + +## 7. Module Invariants + +The following invariants must hold at all times. These are suitable for registration via the Cosmos SDK +`InvariantRegistry` and should be checked in integration tests. + +### 7.1 Escrow Balance Invariant + +The module account balance must equal the sum of all escrowed funds: + +``` +module_balance = sum(att.fee for att in attestations where att.fee_status == Escrowed) + + sum(att.deposit for att in attestations where att.status == Valid) + + sum(auditor.bond_amount for auditor in auditors where auditor.bond_status == Bonded or Frozen) + + sum(bond.bonded_amount for bond in provider_bonds) + + sum(entry.amount for entry in all unbonding_entries across all provider_bonds) +``` + +### 7.2 Attestation Validity Invariant + +No attestation with status `Valid` has `expires_at` in the past (relative to the last processed block time). + +### 7.3 Frozen Auditor Invariant + +Every auditor with `bond_status == Frozen` has at least one `DiscrepancyEvent` in `Pending` status where they +are either `auditor_a` or `auditor_b`. + +### 7.4 Discrepancy Consistency Invariant + +For every `DiscrepancyEvent` in `Pending` status: +- Both `auditor_a` and `auditor_b` have `bond_status == Frozen` +- The attestation records referenced by the discrepancy have `status == Voided` and + `voided_reason == Discrepancy` + +### 7.5 Snapshot Compliance Invariant + +No provider with `ProviderSnapshotRecord.suspended == true` has a valid attestation at Level 2 or above that is +active for bid-matching purposes. + +### 7.6 Provider Bond Invariant + +For every provider with a valid attestation at Level 2 or above, the provider's `bonded_amount` is at least the +minimum required for the attested tier (calculated from the provider's `ResourceSummary` and the tier's +bond-per-resource governance parameters). + +### 7.7 Self-Attestation Invariant + +No attestation record exists where `provider == auditor`. + +### 7.8 Auditor Authority Invariant + +No valid attestation exists where the attested tier exceeds the auditor's `max_attestation_tier`. + +--- + +## 8. CLI Commands + +Key commands for common operations. Full command set is auto-generated from proto service definitions. + +### 8.1 Transaction Commands + +```bash +# Auditor: post bond after governance approval +akash tx verification post-auditor-bond [amount] --from [auditor-key] + +# Auditor: submit attestation for a provider +akash tx verification submit-attestation \ + --provider [provider-addr] \ + --tier [identified|verified|established|trusted] \ + --capabilities [tee_hardware_attestation,confidential_computing,...] \ + --evidence-hash [hex-encoded-hash] \ + --fee [amount] \ + --deposit [amount] \ + --from [auditor-key] + +# Auditor: revoke own attestation +akash tx verification revoke-attestation --provider [provider-addr] --from [auditor-key] + +# Provider: post economic bond +akash tx verification post-provider-bond [amount] --from [provider-key] + +# Provider: post inventory snapshot hash +akash tx verification post-snapshot-hash \ + --snapshot-hash [hex-encoded-sha256] \ + --resource-summary [json-file-or-inline] \ + --snapshot-timestamp [RFC3339] \ + --from [provider-key] + +# Provider: remove an attestation on self +akash tx verification remove-attestation --auditor [auditor-addr] --from [provider-key] +``` + +### 8.2 Query Commands + +```bash +# Query all attestations for a provider +akash query verification provider-attestations [provider-addr] + +# Query a specific attestation +akash query verification attestation [provider-addr] [auditor-addr] + +# Query provider's bond status +akash query verification provider-bond [provider-addr] + +# Query provider's snapshot record +akash query verification provider-snapshot [provider-addr] + +# Query auditor details +akash query verification auditor [auditor-addr] + +# Query module parameters +akash query verification params + +# Query pending discrepancies +akash query verification discrepancies --status pending +``` + + diff --git a/src/content/aeps/aep-86/README.md b/src/content/aeps/aep-86/README.md new file mode 100644 index 000000000..51ac946e0 --- /dev/null +++ b/src/content/aeps/aep-86/README.md @@ -0,0 +1,1677 @@ +--- +aep: 86 +title: "Provider Verifications" +author: Artur Troian (@troian) +status: Draft +type: Standard +category: Core +created: 2026-04-04 +supersedes: 9, 40 +--- + +## Motivation + +The current provider trust model on Akash (AEP-9) relies on manually assigned audited attributes using a Web of Trust. +This approach is binary (audited or not), does not scale, depends on human accreditors as bottlenecks, and provides no +graduated signal to tenants about the depth of verification a provider has undergone. + +Tenants deploying workloads on a decentralized compute marketplace need confidence that: +- The hardware is genuine (not spoofed) +- Provisioned resources match what was bid +- The provider is reliably available +- The infrastructure is where it claims to be +- The operator is accountable + +Traditional cloud providers (AWS, GCP, Azure) guarantee these properties implicitly. On Akash, each must be independently +verified. This AEP introduces a graduated, auditor-attested verification tier system that replaces the current audited +attributes model. + +## Summary + +This AEP defines: + +1. **[Verification Tiers](./README.md#verification-tiers)** -- Five levels (Level 0 through Level 4) representing + progressively deeper verification of a provider's infrastructure, performance, reliability, and operational maturity. + Level 0 is the highest trust; Level 4 is permissionless (unverified). + +2. **[Auditor Role](./README.md#auditor-role)** -- Governance-approved entities that evaluate providers and submit + on-chain attestations. Auditors post bonds that scale with the highest tier they are authorized to attest and charge + providers on-chain fees for audits. + +3. **[Cross-Validation](./README.md#cross-validation)** -- A mechanism where multiple independent auditor attestations + for the same provider are compared. Discrepancies exceeding one tier level trigger automatic voiding of conflicting + attestations and bond freezing. + +4. **[Attestation Lifecycle](./README.md#attestation-submission)** -- Per-tier TTLs, fee escrow, revocation, and expiry + mechanics. + +5. **[On-Chain Prerequisite Enforcement](./README.md#on-chain-prerequisite-enforcement)** -- Machine-verifiable tier + prerequisites (provider registration age, bond posting, lease completion rate, snapshot hash compliance) are enforced + by the chain when attestations are submitted. + +6. **[Provider Economic Bond](./README.md#provider-economic-bond)** -- Providers post AKT collateral scaled to their + declared resource capacity. The bond is slashable on proven resource misrepresentation and is required for Level 2 + and above. + +7. **[Snapshot Hash Enforcement](./README.md#snapshot-hash-enforcement)** -- Providers must periodically post inventory + snapshot hashes on-chain. Non-compliance triggers automatic attestation suspension for bid-matching purposes. + +8. **[New `x/verification` Module](./README.md#module-identity)** -- A new Cosmos SDK module (`akash.verification.v1`) + replaces the existing `x/audit` module with a + [phased migration plan](./README.md#migration-from-aep-9). + +This AEP supersedes AEP-9 (Trusted Providers) and AEP-40 (Continuous Provider Audits), +unifying their concerns into a single framework. + +## Specification + +### Verification Tiers + +Tiers represent the depth of verification a provider has undergone. The chain stores individual auditor attestations per +provider; it does not compute a single effective tier. Interpretation of attestations is the responsibility of the +application layer (Console, SDL matching, incentive modules). + +#### Level 4 -- Permissionless + +Trust statement: _"This provider is registered on the network."_ + +No verification has been performed. The provider has registered on-chain and its daemon is reachable. All provider +attributes are self-reported and unverified. + +Requirements: +- Valid on-chain provider registration (`MsgCreateProvider`) +- Provider daemon is reachable (status endpoint responds) + +#### Level 3 -- Identified + +Trust statement: _"The provider runs verified software, and we know who operates it."_ + +The provider is running cryptographically signed provider software and the operator behind it is known. This is +the first meaningful trust boundary. + +Requirements: +- All Level 4 requirements +- Signed code -- the provider must be running an officially signed release of the provider software. The software + signature is verified on-chain or by the auditor during attestation. The provider must continue running signed + software to retain this verification level; running unsigned or modified builds is grounds for attestation revocation. +- Operator identity -- linked to a verifiable entity. The acceptable identity types vary by tier: + + | Tier | Acceptable Identity Types | + |------|---------------------------| + | L3 (Identified) | Business registration, domain ownership, or DID-based credential | + | L2 (Verified) | Business registration or DID-based credential (domain ownership alone is insufficient) | + | L1 (Established) | Business registration or DID-based credential | + | L0 (Trusted) | Business registration required | + + The auditor records the identity type verified in their off-chain evidence (referenced by `evidence_hash`). + Signed identity attestation stored on-chain. +- Contact channel -- verified communication endpoint for incident response. Provider must respond to + critical incidents within `contact_response_critical_l3` and standard inquiries within + `contact_response_standard_l3` (governance parameters). See + [Contact Responsiveness](./README.md#contact-responsiveness). +- Stated location -- claimed geographic region/jurisdiction (self-reported; verified at higher tiers) + +#### Level 2 -- Verified + +Trust statement: _"The infrastructure delivers what it promises, and it is where it claims to be."_ + +Self-reported attributes have been independently validated through automated testing. The provider's claimed resources, +network quality, and physical location have been verified. + +Requirements: +- All Level 3 requirements +- Resource delivery accuracy -- provisioned CPU, RAM, GPU memory, and storage match bid quantities. Validated via + automated resource audit (deploy test workload, measure actual vs. claimed). +- Network quality baseline -- bandwidth and latency meet minimum thresholds. Validated via automated benchmarks from + multiple vantage points. +- Physical location verification -- hardware is in the claimed region. Validated via IP geolocation and network path + analysis (traceroute hops, RTT triangulation). +- Inventory consistency -- on-chain attributes match the provider's + [Inventory Service](./README.md#inventory-service) snapshots. Auditors reconcile snapshot data against on-chain + attributes; discrepancies flag downgrade. +- Contact responsiveness -- stricter response windows than Level 3. Provider must respond to critical incidents + within `contact_response_critical_l2` and standard inquiries within `contact_response_standard_l2`. See + [Contact Responsiveness](./README.md#contact-responsiveness). +- Minimum economic bond -- AKT locked as provider collateral, scaled to provider capacity. Slashable on proven resource + misrepresentation. See [Provider Economic Bond](./README.md#provider-economic-bond). +- Minimum uptime -- provider available for a governance-configurable minimum period (e.g., 30 days). Uptime is + measured on-chain via [snapshot hash posting](./README.md#snapshot-hash-enforcement) frequency: a provider that + maintains continuous snapshot compliance (posting within every `snapshot_hash_interval` window) for the required + period demonstrates sustained availability. The `min_age_l2` on-chain prerequisite enforces the minimum + registration age, while snapshot compliance history provides the availability proof. +- Snapshot hash compliance -- provider must be actively posting inventory snapshot hashes on-chain and must not be + in suspended state. See [Snapshot Hash Enforcement](./README.md#snapshot-hash-enforcement). + +#### Level 1 -- Established + +Trust statement: _"This provider is reliably available and consistently performs."_ + +Time-proven reliability separates serious operators from casual participants. The provider has demonstrated sustained +performance over an extended period. + +Requirements: +- All Level 2 requirements +- Extended availability -- minimum governance-configurable rolling uptime above threshold (e.g., 90 days at 99%+). +- Lease completion rate -- above governance-configurable threshold (e.g., 98%). Ratio of provider-completed vs. + provider-terminated leases over lookback window. +- Continuous performance audits -- periodic automated benchmarks (CPU/GPU, memory bandwidth, storage IOPS, network + throughput) pass consistently against tier-specific thresholds. +- Resource delivery under load -- audit probes run while provider has active leases to detect overcommitment. +- Data isolation verification -- automated cross-tenant isolation tests (network namespace, filesystem, process isolation). +- Contact responsiveness -- stricter response windows than Level 2. Provider must respond to critical incidents + within `contact_response_critical_l1` and standard inquiries within `contact_response_standard_l1`. See + [Contact Responsiveness](./README.md#contact-responsiveness). +- Higher economic bond -- increased AKT collateral scaled to capacity; longer unbonding period. +- Clean history -- no slashing events or unresolved disputes in lookback window. + +**Application-layer policy**: Level 1 should require 2+ independent auditor attestations at Level 1 or better. +Tenants can enforce multi-auditor requirements by specifying multiple required auditor addresses in the SDL/manifest +`verification.auditors` list (see [SDL Syntax](./README.md#sdl-syntax)). The existing `required_auditors` field +in `VerificationRequirement` supports this -- a tenant listing two auditor addresses requires attestations from +both. + +#### Level 0 -- Trusted + +Trust statement: _"This provider is externally audited and offers maximum assurance."_ + +The highest level of provider assurance. Requires everything from Level 1 plus independent third-party validation of +physical infrastructure and operational practices. + +Requirements: +- All Level 1 requirements +- Third-party infrastructure audit -- independent verification of physical data center, power redundancy, cooling, and + physical security. Audit by governance-approved auditor; audit report hash stored on-chain. Periodic re-audit required. +- Extended compliance history -- minimum governance-configurable continuous Level 1 compliance (e.g., 180 days). +- Maximum economic bond -- highest AKT collateral tier, scaled to capacity. +- SLA commitment -- provider publishes availability and performance SLA parameters. The SLA document hash is + included in the auditor's off-chain evidence (referenced by `evidence_hash`). The auditor verifies that the + provider has maintained compliance with their published SLA during the evaluation period. SLA breaches are + enforced via the existing [`SlashProviderBond`](./README.md#slashproviderbond) governance action when reported + by auditors or tenants. +- Contact responsiveness -- strictest response windows. Provider must respond to critical incidents within + `contact_response_critical_l0` and standard inquiries within `contact_response_standard_l0`. See + [Contact Responsiveness](./README.md#contact-responsiveness). +- Governance endorsement (optional, supplementary) -- positive attestation from established network participants. + +**Application-layer policy**: Level 0 should require 2+ independent auditor attestations at Level 0. +This can be enforced via the SDL `verification.auditors` list as described in +[Level 1](./README.md#level-1----established). + +### Verification Tier Summary + +| Dimension | L4 (Permissionless) | L3 (Identified) | L2 (Verified) | L1 (Established) | L0 (Trusted) | +|-------------------------------|:-------------------:|:----------------------------:|:----------------------------------:|:-----------------------------------:|:-----------------------------------:| +| Signed code | -- | Required | Required | Required | Required | +| Operator identity | -- | Required | Required | Required | Required | +| Resource delivery accuracy | -- | -- | Automated test | Under-load audit | Under-load audit | +| Network quality | -- | -- | Baseline check | Continuous benchmark | Continuous benchmark | +| Physical location | -- | Self-reported | Verified (geo + network) | Verified | Audited on-site | +| Inventory consistency | -- | -- | Machine-reconciled | Machine-reconciled | Machine-reconciled | +| Economic bond | -- | -- | Minimum | Higher | Maximum | +| Uptime track record | -- | -- | 30-day minimum | 90-day, 99%+ | 180-day continuous | +| Lease completion rate | -- | -- | -- | 98%+ | 98%+ | +| Data isolation | -- | -- | -- | Automated test | Automated test | +| Continuous audits | -- | -- | -- | Periodic pass | Periodic pass | +| Contact responsiveness | -- | 72 hours critical / 7 days standard | 24 hours critical / 72 hours standard | 4 hours critical / 24 hours standard | 1 hour critical / 4 hours standard | +| Third-party physical audit | -- | -- | -- | -- | Required | +| On-chain SLA | -- | -- | -- | -- | Required | +| Snapshot hash compliance | -- | -- | Required | Required | Required | + +### Contact Responsiveness + +Providers must maintain verified communication channels and respond within tier-specific time windows. Response +time requirements escalate with verification tier, reflecting the operational maturity expected at each level. + +#### Incident Severity Levels + +| Severity | Definition | Examples | +|------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------| +| **Critical** | Service-affecting incident impacting active tenant workloads or network security | Node outage, data loss, security breach, network partition | +| **Standard** | Non-urgent operational inquiry or minor issue | Configuration question, billing inquiry, planned maintenance coordination | + +#### Response Time Requirements by Tier + +| Tier | Critical Incident | Standard Inquiry | +|-------------------|:------------------------------:|:-------------------------------:| +| L3 (Identified) | `contact_response_critical_l3` | `contact_response_standard_l3` | +| L2 (Verified) | `contact_response_critical_l2` | `contact_response_standard_l2` | +| L1 (Established) | `contact_response_critical_l1` | `contact_response_standard_l1` | +| L0 (Trusted) | `contact_response_critical_l0` | `contact_response_standard_l0` | + +All response time thresholds are [governance parameters](./README.md#governance-parameters). See +[Initial Governance Parameter Values](./README.md#initial-governance-parameter-values) for suggested defaults. + +Response time is measured from the moment the incident is reported through the provider's verified contact channel +to the first meaningful acknowledgement from the provider (not an automated reply). "Meaningful acknowledgement" +means a human response that demonstrates awareness of the reported issue. + +#### Auditor Verification + +Auditors verify contact responsiveness as part of their off-chain evaluation during attestation: + +1. **Probe test** -- the auditor sends a test message (clearly identified as an audit probe) to the provider's + verified contact channel and measures the time to first meaningful response. The probe should be sent without + prior notice during the attestation window. +2. **Historical review** -- the auditor reviews any available incident response history (e.g., from tenant + reports, public incident logs, or prior audit records). +3. **Attestation evidence** -- response time measurements are included in the off-chain audit report + (referenced by `evidence_hash` in the attestation). + +Non-response or exceeding the tier-specific response window is grounds for attesting at a lower tier or refusing +attestation. Repeated non-response during an attestation's TTL is grounds for auditor-initiated revocation via +`MsgRevokeAttestation`. + +### Signed Code Verification + +Provider software releases must be cryptographically signed. The signing scheme uses a governance-managed key set +stored on-chain. + +#### Signing Key Set + +The `x/verification` module maintains a `SigningKeySet` -- a list of public keys authorized to sign provider +software releases. Keys are managed via governance: + +- **Key registration**: A governance proposal adds a new signing public key with an activation timestamp and an + optional expiry timestamp. +- **Key rotation**: A new key is registered via governance before the old key's expiry. Multiple keys may be valid + simultaneously during rotation windows. +- **Key revocation**: A governance proposal sets a key's expiry to the current time, immediately invalidating it. + +#### Verification Flow + +1. The provider software binary includes an embedded signature over its content, signed by one of the authorized + signing keys. +2. During attestation, the auditor verifies the provider's running binary signature against the on-chain + `SigningKeySet`. The auditor queries the set, confirms at least one valid (activated, not expired) key matches + the binary's signature. +3. The `software_signature` field in [`ResourceSummary`](./README.md#provider-snapshot-record) records the + signature of the running binary. The `software_version` field records the version string. +4. Running unsigned or modified builds, or builds signed with a revoked/expired key, is grounds for refusing + attestation or revoking an existing attestation. + +#### Key Set State + +``` +SigningKey { + public_key: bytes + activated_at: Timestamp + expires_at: Timestamp (nullable -- null means no expiry) + revoked: bool +} +``` + +The key set is stored in the `x/verification` module state and is queryable by any participant. + +### Capability Flags + +Certain provider capabilities are orthogonal to the verification tier and are tracked as on-chain flags embedded within +auditor attestations. The presence or absence of a capability flag does not affect a provider's ability to reach any +verification tier. + +Capability flags are included as part of +[`MsgSubmitAttestation`](./README.md#attestation-submission). When an auditor attests a provider, they include both +a verification tier and a set of capability flags that the auditor has verified. This means a single attestation carries +the complete auditor assessment: tier plus capabilities. + +Defined capability flags: + +- **Hardware Attestation (TEE)** -- CPU/GPU identity cryptographically verified via TEE attestation (Intel TDX, + AMD SEV-SNP, NVIDIA NVTrust) per AEP-29. Proves the hardware is genuine. Providers with this flag offer + cryptographic proof of their hardware identity, but hardware attestation is not required for any verification tier. +- **Confidential Computing** -- TEE available for tenant workloads (encrypted execution environment the provider + operator cannot inspect). Verified via TEE enclave provisioning test per AEP-65. +- **Persistent Storage** -- Provider supports persistent storage volumes that survive lease restarts. +- **Bare Metal** -- Provider offers bare-metal (non-virtualized) compute. Verified by the auditor via inventory + snapshot [virtualization detection](./README.md#virtualization-detection) showing no hypervisor present. + +Additional capability flags may be added via governance parameter updates without requiring a chain upgrade, as the +on-chain representation uses an extensible enum. + +[Cross-validation](./README.md#cross-validation) (discrepancy detection) applies only to the tier component of +attestations, not to capability flags. Two auditors may attest the same provider at the same tier but with different +capability flags without triggering a discrepancy. Tenants who require specific capabilities should prefer providers +with multiple independent auditor attestations confirming that capability. + +### Inventory Service + +Providers expose an Inventory Service endpoint that reports machine-generated snapshots of the provider's capabilities +and resources. The Inventory Service is the primary mechanism by which auditors verify provider claims, both during +initial attestation and on an ongoing basis. The endpoint is open to any network participant (auditors, tenants, +tools) for read-only queries. + +> **Implementation note**: The Inventory Service already exists in the provider codebase and must be significantly +> extended to meet this specification. The extensions include: multi-source data collection for all hardware classes, +> challenge-response protocol support, virtualization detection, snapshot signing with the provider's on-chain key, +> and on-chain snapshot hash posting via `MsgPostSnapshotHash`. See +> [Implementation Guide](./IMPLEMENTATION.md#5-inventory-service-extensions) for codebase-specific details. + +The Inventory Service absorbs the scope of AEP-41 (Standard Provider Attributes) by providing a machine-authoritative +source of provider capabilities, replacing the previous model of self-reported attributes. + +> **AEP-41 deprecation**: AEP-41 provider attributes are superseded by inventory snapshots. There is no automatic +> migration or attribute-to-snapshot mapping -- the systems are fundamentally different (key-value self-reported +> attributes vs. structured machine-generated hardware inventory). During the +> [migration period](./README.md#migration-from-aep-9) (Phases 1-3), existing attributes remain queryable via +> `x/audit`. After Phase 4, they are removed. Provider capabilities are determined solely by inventory snapshots +> and auditor attestations. + +#### Threat Model + +The fundamental challenge of inventory reporting is that the provider controls the machine generating the report. +Even with signed code, the operating system, hypervisor, and firmware can lie about the hardware present. The +Inventory Service is designed around the assumption that any single data source can be compromised, and that +defense requires multiple independent layers. + +##### Attack Vectors + +The following interception points exist between the inventory binary and the actual hardware, ordered from easiest +to hardest to exploit: + +1. **Dynamic linker interception (LD_PRELOAD)** -- The provider sets environment variables to inject a shared library + that intercepts libc calls (`open`, `read`, `ioctl`) before they reach the kernel. The signed binary calls + `fopen("/proc/cpuinfo")` and receives fabricated data. + +2. **Fake sysfs/procfs** -- The provider mounts overlayfs on `/proc` or `/sys` with modified files, or uses bind + mounts to replace specific entries. OS-level tools report whatever the provider has placed in these filesystems. + +3. **Kernel module syscall hooking** -- The provider loads a kernel module that intercepts syscalls at the kernel + level. Even if the binary bypasses libc, the kernel itself returns fake data. + +4. **Hypervisor spoofing** -- The provider runs the signed software inside a virtual machine. The hypervisor + intercepts privileged instructions (including CPUID) and presents a fake hardware topology. The signed binary + has no way to distinguish VM-presented hardware from physical hardware. + +5. **Firmware/BIOS modification** -- The provider modifies SMBIOS/DMI tables in the BIOS to report different + hardware identifiers. Tools like `dmidecode` return whatever the firmware claims. + +6. **Resource overcommitment** -- The provider has real hardware but overcommits it -- advertising 8 GPUs as + available when 4 are already allocated to other tenants. Not a hardware spoofing attack, but an inventory + accuracy attack. + +No single defense mechanism defeats all of these vectors. The Inventory Service uses a layered defense strategy +where each layer addresses specific attack vectors, and the combination raises the cost of successful cheating +beyond the economic benefit. + +##### Defense Layers + +| Layer | Mechanism | Vectors Defeated | +|----------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------| +| 1. Static binary | Statically compiled, no dynamic library loading | LD_PRELOAD, dynamic linker injection | +| 2. Direct hardware reads | Read hardware registers directly, bypassing OS filesystem abstractions | Fake sysfs/procfs, some kernel module attacks | +| 3. Multi-source cross-validation | Report the same property from multiple independent data sources | Single-source spoofing (faking one source but not all) | +| 4. Virtualization detection | Active probing for hypervisor presence | Naive VM spoofing | +| 5. Performance benchmarking (auditor-driven) | Deploy real workloads and measure actual throughput | All software-level spoofing (fake hardware cannot deliver real performance) | +| 6. TEE attestation (when available) | Hardware-rooted chain of trust via enclave execution | Everything including VM spoofing and kernel-level attacks | + +Layer 5 (performance benchmarking) is the ultimate backstop: a provider claiming 8x A100 GPUs must deliver 8x A100 +performance. There is no way to fake this without actually having the hardware. This is already part of the +[Level 2 (Verified)](./README.md#level-2----verified) tier requirements. + +Layer 6 (TEE attestation) is the gold standard but cannot be required for all providers. Providers with the Hardware +Attestation [capability flag](./README.md#capability-flags) offer the strongest guarantee. Tenants who need maximum +hardware assurance can require it. + +#### Binary Requirements + +The Inventory Service binary is distributed as part of the signed provider software (the same signed code required +for Level 3+). It must meet the following requirements: + +- **Statically linked for system libraries** -- all system dependencies (libc, etc.) are compiled in; the binary + makes direct syscalls to the kernel, bypassing libc entirely. This eliminates LD_PRELOAD and dynamic linker + interception for system-level calls. **Exception for vendor hardware management libraries**: the binary may + dynamically load vendor-provided management libraries (e.g., NVIDIA NVML, AMD ROCm SMI) at runtime, provided + the library's cryptographic signature is verified against the governance-managed + [signing key set](./README.md#signed-code-verification) before loading. Vendor library data is treated as one + data source among multiple; discrepancies between vendor library output and direct hardware reads (e.g., PCIe + configuration space) are flagged in the snapshot. +- **Signed and versioned** -- the binary's cryptographic signature is verifiable against the governance-managed + [signing key set](./README.md#signed-code-verification). The software version and signature are included in + every inventory snapshot. +- **Direct hardware register reads** -- in addition to reading OS interfaces (`/proc`, `/sys`), the binary reads + hardware properties directly from hardware registers where possible (see + [Multi-Source Data Collection](./README.md#multi-source-data-collection) below). Both values are reported in the + snapshot. For GPU hardware, direct PCIe configuration space reads provide vendor ID, device ID, and BAR + configuration independently of vendor management libraries. + +Whether the Inventory Service is a separate signed binary or part of the main provider daemon binary is an +implementation detail. + +#### Transport and Access + +The Inventory Service is exposed as a gRPC service on the existing provider daemon endpoint. It reuses the +provider daemon's TLS configuration and requires no separate authentication -- it is a public, read-only endpoint +analogous to the existing provider status endpoint. + +- **Protocol**: gRPC over TLS (reuses existing provider daemon TLS certificate) +- **Port**: same as the provider daemon (no separate port) +- **Authentication**: none required (public read-only) +- **Response signing**: every response MUST be signed by the provider's on-chain key, regardless of whether a + challenge nonce was included. The signature covers the entire response payload. +- **Rate limiting**: providers SHOULD implement rate limiting on the Inventory Service endpoint. Recommended + default: 60 requests per minute per source IP. The rate limit is provider-configurable and is not enforced + on-chain. + +#### Snapshot Contents + +An inventory snapshot includes: + +- **Hardware capabilities** -- CPU models and core counts, GPU models and memory, total physical memory, storage + devices and capacities, network interface specifications. Each property is reported from multiple independent + data sources (see [Multi-Source Data Collection](./README.md#multi-source-data-collection)). +- **Resource allocation** -- currently allocated resources (CPU, memory, GPU, storage) across active leases. +- **Available resources** -- unallocated resources available for new workloads. +- **Software version** -- provider software version and its cryptographic signature. +- **Virtualization status** -- whether the binary detected it is running inside a virtual machine, and the + detection method and hypervisor identity if detected (see + [Virtualization Detection](./README.md#virtualization-detection)). +- **Challenge nonce** -- if the query included a nonce, the nonce is included in the signed payload + (see [Challenge-Response Protocol](./README.md#challenge-response-protocol)). +- **Timestamp** -- snapshot generation time. +- **Source consistency** -- for each hardware property, whether all data sources agree. Discrepancies are flagged + with the specific values from each source. + +Each snapshot is signed by the provider's on-chain key, binding the machine-reported data to the provider's identity. +The signature covers the entire snapshot payload including the nonce and timestamp, preventing modification or replay. + +#### Multi-Source Data Collection + +The Inventory Service must report hardware properties from multiple independent data sources. This is a hard +requirement, not a recommendation. Snapshots that include only a single source for a property when multiple sources +are available should be treated as lower confidence by auditors. + +For each hardware class, the binary reads from both OS-level interfaces and direct hardware register reads where +possible: + +**CPU:** + +| Source | Method | Notes | +|--------|--------|-------| +| CPUID instruction | Direct CPU execution (ring 3) | Reports model, family, stepping, feature flags, topology. On bare metal, cannot be intercepted by the OS kernel. In a VM, the hypervisor can intercept. | +| `/proc/cpuinfo` | OS filesystem | Kernel-reported view. Fakeable via procfs overlay or kernel module. | +| DMI/SMBIOS tables | Firmware-provided (`/sys/firmware/dmi/`) | BIOS-reported CPU socket information. Fakeable via firmware modification. | +| Topology enumeration | CPUID leaf 0x0B / 0x1F | Direct enumeration of cores, threads, packages. | + +**GPU:** + +| Source | Method | Notes | +|--------|--------|-------| +| PCIe configuration space | Direct read via sysfs PCI or `/dev/mem` | Reads PCI vendor ID, device ID, subsystem ID from hardware. Harder to fake than sysfs device entries. | +| Vendor management library (e.g., NVML) | Vendor-provided API | Reports model, serial number, memory, temperature. For NVIDIA: NVML. For AMD: ROCm SMI. | +| `/sys/class/drm/` | OS filesystem | Kernel DRM subsystem view. Fakeable via sysfs overlay. | + +**Memory:** + +| Source | Method | Notes | +|--------|--------|-------| +| SPD data via SMBus/I2C | Direct read from DIMM EEPROM | Reports actual physical DIMM capacity, speed, manufacturer from the memory module itself. Requires SMBus access. | +| `/proc/meminfo` | OS filesystem | Kernel-reported total and available memory. Fakeable. | +| DMI/SMBIOS tables | Firmware-provided | BIOS-reported memory array information. | +| EDAC sysfs | `/sys/devices/system/edac/mc/` | Error detection and correction subsystem memory controller view. | + +**Storage:** + +| Source | Method | Notes | +|--------|--------|-------| +| SMART data | ATA/NVMe passthrough commands | Reads drive model, capacity, serial number, health directly from the storage controller. | +| NVMe admin commands | Direct NVMe identify controller/namespace | For NVMe drives, reads capacity and capabilities from the drive firmware. | +| `/sys/block/` | OS filesystem | Kernel block device view. Fakeable. | + +**Network:** + +| Source | Method | Notes | +|--------|--------|-------| +| Ethtool ioctl | Direct NIC query | Reads link speed, driver, firmware version from the network interface controller. | +| `/sys/class/net/` | OS filesystem | Kernel network device view. Fakeable. | +| PCIe configuration space | Direct read | NIC vendor ID, device ID, BAR configuration. | + +The snapshot includes the value from each source for every property. Where sources agree, this is a positive trust +signal. Where they disagree, the discrepancy is explicitly flagged with all values included, allowing auditors to +assess which source (if any) has been tampered with. + +Spoofing all sources simultaneously for a single property requires compromising independent subsystems (CPU +registers, kernel, firmware, physical hardware buses), which significantly raises the attack cost. + +#### Virtualization Detection + +The Inventory Service binary actively probes for virtualization and reports the result in every snapshot. Running +in a VM does not disqualify a provider or prevent any verification tier, but it is a fact that auditors and tenants +can see. A provider claiming bare-metal hardware while detected as running in a VM is a red flag for auditor +investigation. + +Detection methods: + +- **CPUID hypervisor present bit** -- CPUID leaf 0x1, ECX bit 31 is set by all compliant hypervisors per the x86 + specification. +- **Hypervisor vendor string** -- CPUID leaves 0x40000000-0x400000FF return the hypervisor identity string + (e.g., "KVMKVMKVM", "VMwareVMware", "Microsoft Hv", "XenVMMXenVMM"). +- **Timing analysis** -- certain instruction sequences (e.g., CPUID, RDTSC) have measurably different latencies + in VMs vs. bare metal due to VM exit overhead. +- **Hardware fingerprints** -- presence of VM-specific PCI device IDs (e.g., virtio devices, VMware SVGA, Hyper-V + synthetic devices) in the PCIe configuration space. +- **DMI/SMBIOS indicators** -- VM platforms often set specific manufacturer/product strings in SMBIOS tables. + +The snapshot reports: detected/not-detected, detection method(s) that triggered, and hypervisor identity if +available. + +#### Challenge-Response Protocol + +All inventory queries support a challenge-response mechanism to prevent snapshot replay attacks. + +1. The querier (auditor, tenant, or tool) sends a request that includes a random **nonce** (32 bytes, + cryptographically random). The nonce MUST be generated using a cryptographically secure random number generator. +2. The Inventory Service generates a fresh snapshot, includes the nonce in the payload, and signs the entire + payload (including nonce and timestamp) with the provider's on-chain key. +3. The querier verifies the signature against the provider's known on-chain key and confirms the nonce matches. + +This ensures: +- The snapshot was generated after the query was issued (prevents caching old snapshots) +- The snapshot has not been modified in transit (signature verification) +- The snapshot was generated by the claimed provider (key binding) + +Queries without a nonce are valid but produce snapshots that could be replayed. Auditors should always include a +nonce. Casual queries (e.g., tenant browsing providers in Console) may omit the nonce. + +#### On-Chain Snapshot Hashes + +Providers periodically post the hash of their latest inventory snapshot on-chain via `MsgPostSnapshotHash`. This +creates an immutable, timestamped audit trail of the provider's self-reported state. + +##### Posting Mechanism + +`MsgPostSnapshotHash` includes: + +- `provider`: provider address (signer) +- `snapshot_hash`: SHA-256 hash of the full snapshot payload +- `resource_summary`: on-chain summary of key capacity metrics (see `ResourceSummary` in + [State Records](./README.md#provider-snapshot-record)) +- `snapshot_timestamp`: snapshot generation time (must be within `max_snapshot_age` of block time) + +On submission: +1. Validate the provider is registered on-chain +2. Validate `snapshot_timestamp` is within `[block_time - max_snapshot_age, block_time]` (prevents stale snapshots) +3. Store or replace the `ProviderSnapshotRecord` +4. If the provider was snapshot-suspended, clear the suspension and emit `EventSnapshotResumed` +5. Update the snapshot compliance expiry queue: set next deadline to `block_time + snapshot_hash_interval` +6. Emit `EventSnapshotHashPosted` + +##### Snapshot Hash Enforcement + +When a provider's `compliance_deadline` passes without a new `MsgPostSnapshotHash`, the following automatic +consequences apply: + +1. The provider's `ProviderSnapshotRecord.suspended` is set to `true` by the + [EndBlocker](./README.md#endblocker-design) +2. An `EventSnapshotSuspended` event is emitted +3. **Effect on attestations**: Suspended providers' attestations at Level 2 and above are treated as **inactive for + bid matching purposes**. The attestations themselves are NOT voided -- they remain valid in state and their TTLs + continue running. However, the `x/market` module checks snapshot compliance as an additional filter and rejects + bids from suspended providers for orders that require Level 2+ verification (see + [Market Module Integration](./README.md#market-module-integration)). +4. **Effect on new attestations**: `MsgSubmitAttestation` at Level 2 or above is rejected for snapshot-suspended + providers ([on-chain prerequisite enforcement](./README.md#on-chain-prerequisite-enforcement)). +5. **Recovery**: When the provider posts a new valid snapshot hash via `MsgPostSnapshotHash`, the suspension is + immediately cleared. No governance action is required. + +This design avoids permanent damage for temporary outages (attestations are not voided, TTLs keep running) while +creating a strong economic incentive to maintain snapshot freshness (suspended providers cannot win bids at L2+). + +##### Auditor Use of Snapshot Hashes + +- Auditors can request the full off-chain snapshot corresponding to any on-chain hash and verify the match. +- If a provider's snapshot changes significantly between two on-chain hashes without a corresponding on-chain + event (e.g., lease creation/termination), this is a signal for auditor investigation. +- On-chain hashes also allow retroactive verification: if a provider is later found to be cheating, the chain + of snapshot hashes shows when the misrepresentation began. +- Auditors verify that the off-chain snapshot they receive matches the most recent on-chain snapshot hash. A + mismatch indicates the provider is serving different data to different queriers. + +#### Auditor Use + +Auditors query the Inventory Service as part of their evaluation: + +- **During initial attestation** -- the auditor queries the Inventory Service with a nonce, compares the snapshot + against the provider's on-chain attributes, verifies multi-source consistency, checks virtualization status, and + cross-references against on-chain lease records. Discrepancies between the snapshot and on-chain claims are grounds + for refusing attestation or attesting at a lower tier. +- **Periodic checks** -- auditors may call the Inventory Service at any time during the attestation's TTL to verify + that the provider continues to meet the requirements of their attested tier. If a periodic check reveals that the + provider no longer qualifies (e.g., resource capacity has decreased below claimed levels, signed code is no longer + running, new multi-source discrepancies have appeared), the auditor should revoke their attestation via + `MsgRevokeAttestation`. +- **Cross-referencing over time** -- auditors compare inventory snapshots taken at different times, correlated with + on-chain events (lease creation, termination). Available resources should decrease when new leases are created and + increase when leases end. Inconsistencies indicate overcommitment or manipulation. + +### Auditor Role + +#### Overview + +An auditor is a governance-approved on-chain role authorized to evaluate providers and submit tier attestations. Auditors +are economically accountable through bonding and are the human-judgment layer that complements machine-verifiable checks. + +The chain stores auditor registrations, bonds, and attestations as facts. It does not compute provider trust scores or +effective tiers. Application-layer consumers (Console, SDL matching, incentive modules) interpret raw attestation data +according to their own policies. + +The chain enforces a subset of tier prerequisites that are objectively verifiable from on-chain state (see +[On-Chain Prerequisite Enforcement](./README.md#on-chain-prerequisite-enforcement)). Auditors are responsible for all +off-chain verification (resource delivery, identity, network quality, physical location, etc.). If an auditor attests +at a tier that passes on-chain checks but the off-chain evaluation was negligent, the +[cross-validation](./README.md#cross-validation) mechanism and governance review catch it. + +#### Auditor Registration + +Auditors are added exclusively via [governance proposals](./README.md#governance-actions). A proposal to register an +auditor includes: + +- Auditor address +- Maximum attestation tier -- the highest tier this auditor is authorized to attest. An auditor approved for Level 1 + can attest Levels 3, 2, and 1, but not Level 0. +- Organization name and credentials (metadata hash referencing off-chain documentation) +- Required bond amount (determined by max attestation tier) + +On governance approval, the auditor must post their bond via `MsgPostAuditorBond` to become Active. + +To upgrade an auditor's maximum attestation tier (e.g., from Level 2 to Level 0), a new governance proposal is required. +The auditor must post additional bond to cover the difference. + +#### Auditor Bond + +The bond is held in an escrow account and scales with the auditor's maximum attestation authority: + +| Max Attestation Authority | Bond Requirement | +|---------------------------|-------------------------------------------| +| Level 3 (Identified) | `bond_l3` (governance parameter, lowest) | +| Level 2 (Verified) | `bond_l2` > `bond_l3` | +| Level 1 (Established) | `bond_l1` > `bond_l2` | +| Level 0 (Trusted) | `bond_l0` (governance parameter, highest) | + +Bond states: `Bonded`, `Frozen`, `Unbonding`. + +#### Audit Fee + +Providers pay auditors an on-chain fee for performing an audit. The fee is transferred atomically as part of the +[attestation submission](./README.md#attestation-submission) transaction. + +- Governance sets a **minimum fee** per tier level (governance parameter). This prevents race-to-bottom pricing that + could incentivize cursory evaluations. +- Above the minimum, fees are market-determined. Auditors set their own rates; providers choose their auditor. +- Fees are **escrowed** until the attestation reaches its natural TTL expiry. On natural expiry, the escrowed fee is + released to the auditor. If the attestation is voided (by discrepancy or governance action), the fee is returned to + the provider. + +#### Attestation Submission + +An auditor submits an attestation via `MsgSubmitAttestation`: + +- `provider`: provider address being attested +- `auditor`: auditor address (must be Active, authorized for the attested tier) +- `tier`: the verification tier being attested (must be <= auditor's max attestation tier) +- `capabilities`: list of [capability flags](./README.md#capability-flags) the auditor has verified for this provider +- `evidence_hash`: hash of off-chain audit report / evidence documentation +- `fee`: audit fee amount (must be >= governance minimum for this tier) + +On submission, the module: +1. Validates the auditor is Active and authorized for the attested tier +2. Validates the auditor address differs from the provider address (self-audit prevention) +3. Validates the fee meets the governance minimum for this tier +4. Validates [on-chain prerequisites](./README.md#on-chain-prerequisite-enforcement) for the attested tier +5. Transfers the fee from provider to escrow +6. If the same auditor has an existing attestation for this provider, the old attestation is replaced and the old + escrowed fee is released to the auditor +7. Stores the new attestation with a TTL based on tier, including the attested capability flags +8. Adds an entry to the attestation expiry queue for [EndBlocker](./README.md#endblocker-design) processing +9. Checks for discrepancy: scans all valid attestations for this provider from different auditors. If any existing + attestation differs by more than the `discrepancy_threshold` (default: 1 tier level) from the new attestation, + triggers [cross-validation](./README.md#cross-validation). Discrepancy detection applies only to the tier + component, not capability flags. + +### On-Chain Prerequisite Enforcement + +The chain enforces a subset of tier prerequisites that are objectively verifiable from on-chain state. This +prevents obviously invalid attestations from being stored (e.g., a Level 1 attestation for a provider registered +yesterday) and reduces the burden on the [cross-validation](./README.md#cross-validation) mechanism for catching +clear violations. + +Auditors remain responsible for all off-chain verification (resource delivery accuracy, identity verification, +network quality, physical location, performance benchmarks, data isolation, etc.). + +#### Enforceable Prerequisites by Tier + +| Prerequisite | Data Source | L3 | L2 | L1 | L0 | +|----------------------------------------|----------------------------------------------------------------------------|:--------:|:-------------:|:-----------------------------:|:--------------------------------------:| +| Provider registered on-chain | `x/provider` store | Required | Required | Required | Required | +| Provider bond posted at tier-minimum | `x/verification` [provider bond](./README.md#provider-economic-bond) store | -- | Required | Required | Required | +| Provider registration age >= threshold | `x/provider` registration timestamp | -- | `min_age_l2` | `min_age_l1` | `min_age_l0` | +| Lease completion rate >= threshold | `x/market` lease state counters | -- | -- | `min_lease_completion_bps_l1` | `min_lease_completion_bps_l0` | +| No active slashing events in window | `x/verification` provider bond store | -- | -- | `clean_history_window_l1` | `clean_history_window_l0` | +| Snapshot hash compliance | `x/verification` [snapshot](./README.md#snapshot-hash-enforcement) store | -- | Not suspended | Not suspended | Not suspended | +| Continuous prior-tier attestation | `x/verification` attestation history | -- | -- | -- | Valid L1+ for `min_l1_duration_for_l0` | + +#### Validation Flow + +When [`MsgSubmitAttestation`](./README.md#attestation-submission) is processed, step 4 performs the following checks +based on the attested tier: + +**For all tiers (L3 through L0):** +- Query `x/provider` to confirm the provider is registered on-chain. Reject with `ErrProviderNotRegistered` if not found. + +**For Level 2 and above (L2, L1, L0):** +- Query the [provider bond](./README.md#provider-economic-bond) record. Reject with `ErrInsufficientProviderBond` if + the bonded amount is less than the tier-required minimum (calculated from the provider's `ResourceSummary` and the + tier's bond-per-resource [governance parameters](./README.md#governance-parameters)). +- Query the [provider snapshot](./README.md#on-chain-snapshot-hashes) record. Reject with + `ErrProviderSnapshotSuspended` if the provider is snapshot-suspended. +- Compute the provider's registration age as `block_time - registration_time`. Reject with `ErrProviderTooNew` if + the age is less than the tier's `min_age` parameter. + +**For Level 1 and above (L1, L0):** +- Query `x/market` for the provider's lease completion stats over the lookback window. If the provider has at least + `min_leases_for_completion_rate` total leases in the window, compute the completion rate as + `completed_leases / total_leases * 10000` (basis points). Reject with `ErrInsufficientLeaseCompletion` if below + the tier's threshold. If the provider has fewer than `min_leases_for_completion_rate` leases, the check is skipped + (new providers are not penalized for low volume). +- Check the provider bond record for slashing events. Reject with `ErrSlashingHistory` if `last_slash_time` is within + the tier's `clean_history_window`. + +**For Level 0 only:** +- Scan the provider's attestation history to verify that the provider has had a continuous valid attestation at Level 1 + or better for at least `min_l1_duration_for_l0`. "Continuous" means there exists at least one valid L1+ attestation + at all times during the lookback period, considering attestation creation and expiry timestamps. Reject with + `ErrInsufficientL1History` if not met. + +#### Cross-Module Keeper Interfaces + +The `x/verification` module requires read-only access to: + +``` +ProviderKeeper (from x/provider): + Get(ctx, address) -> (Provider, bool) + GetRegistrationTime(ctx, address) -> (Timestamp, bool) + +MarketKeeper (from x/market): + GetProviderLeaseStats(ctx, provider, since) -> LeaseStats +``` + +`LeaseStats`: +``` +LeaseStats { + total_leases: uint64 + completed_by_tenant: uint64 // Tenant-closed or naturally completed + terminated_by_provider: uint64 // Provider-closed prematurely +} +``` + +The `x/market` module must distinguish lease close reasons. The existing `LeaseClosedReasonOwner` (tenant-initiated) +is already tracked. A `LeaseClosedReasonProvider` reason must be distinguished from tenant-initiated closures. The +completion rate is calculated as: `(total_leases - terminated_by_provider) / total_leases`. + +#### Attestation TTL + +Attestations expire automatically. Higher tiers require more frequent re-audit: + +| Tier | TTL | +|---------|-----------------------------------------------------| +| Level 3 | Longest (governance parameter, e.g., 365 days) | +| Level 2 | Shorter (governance parameter, e.g., 180 days) | +| Level 1 | Shorter still (governance parameter, e.g., 90 days) | +| Level 0 | Shortest (governance parameter, e.g., 90 days) | + +On expiry, the attestation status becomes `Expired`, the escrowed fee is released to the auditor, and the attestation +is no longer considered valid. Expiry is processed by the [EndBlocker](./README.md#endblocker-design) via the +attestation expiry queue. + +#### Attestation Revocation + +An auditor may revoke their own attestation at any time via `MsgRevokeAttestation`. This immediately voids the +attestation. The escrowed fee is released to the auditor (the auditor performed the work; revocation is an act of +diligence, not a failure). + +A provider may remove any attestation on themselves via `MsgRemoveAttestation`. The escrowed fee is released to the +auditor (the auditor performed the work; the provider chose to remove the attestation). + +#### Fee Disposition Summary + +The audit fee is released to the auditor in all cases where the attestation reaches a terminal state normally. It +is returned to the provider only when the attestation is voided, indicating the attestation itself was invalid. +The auditor performed evaluation work regardless of who initiates termination; returning the fee on provider-initiated +removal would create a perverse incentive (providers could obtain free audits by removing attestations before TTL +expiry). + +| Terminal Status | Triggered By | Fee Goes To | Rationale | +|---|---|---|---| +| `Expired` | TTL reached (EndBlocker) | Auditor | Auditor performed work; attestation ran its full course | +| `Revoked` | Auditor via `MsgRevokeAttestation` | Auditor | Auditor performed diligent monitoring | +| `Removed` | Provider via `MsgRemoveAttestation` | Auditor | Auditor performed work; provider voluntarily removed | +| `Voided` (Discrepancy) | Cross-validation trigger | Provider | Auditor judgment was disputed | +| `Voided` (Governance) | Governance action | Provider | Governance determined the attestation was invalid | +| `Voided` (BondWithdrawn) | Provider bond withdrawal | Provider | Provider chose to withdraw bond support | +| `Voided` (BondSlashed) | Provider bond slashed | Provider | Provider's misrepresentation caused the void | +| Replaced | Same auditor submits new attestation | Auditor (old fee) | Auditor performed work on prior evaluation | + +### Cross-Validation + +When [`MsgSubmitAttestation`](./README.md#attestation-submission) creates a situation where two valid attestations for +the same provider (from different auditors) differ by more than the `discrepancy_threshold` (default: 1 tier level), +the cross-validation rule triggers: + +1. **Both conflicting attestations are voided** (status: `Voided`, reason: `Discrepancy`) +2. **Both auditors' bonds are frozen** (cannot unbond, cannot issue new attestations until resolved) +3. **Escrowed fees for both voided attestations are returned to the provider** +4. **Provider falls back** to any remaining valid attestations from other auditors. If none remain, the provider is + effectively Level 4 (Permissionless). +5. A `DiscrepancyEvent` is emitted on-chain recording both auditor addresses, both tier claims, the provider, and + a timestamp. + +Resolution requires a [governance proposal](./README.md#resolvediscrepancy) that determines: +- Which auditor (if either) was correct +- Whether to slash either or both auditor bonds +- Whether to remove either or both from the approved auditor set +- Unfreezing bonds of the vindicated auditor(s) + +The discrepancy rule always triggers regardless of the temporal distance between the two attestations. If an old +attestation is stale, it should have expired via its TTL. + +**Multi-auditor discrepancy scenarios**: When the new attestation is checked against all existing valid attestations, +it may conflict with multiple auditors simultaneously. In that case, a separate `DiscrepancyEvent` is created for +each conflicting pair (new auditor vs. each conflicting existing auditor). The new auditor's bond is frozen once +(not per-discrepancy). Each existing conflicting auditor's bond is also frozen. All conflicting attestations +(including the new one) are voided. Each discrepancy must be resolved independently via separate governance proposals. + +**Frozen auditor edge cases**: If an auditor is already frozen from a previous unresolved discrepancy and a new +attestation from a different auditor conflicts with one of the frozen auditor's remaining valid attestations, the +new discrepancy is created normally. The already-frozen auditor gains an additional discrepancy count. Both +discrepancies must be resolved before the frozen auditor can become Active again. + +#### Discrepancy Auto-Resolution Timeout + +If a discrepancy event remains in `Pending` status for longer than `discrepancy_resolution_timeout` (governance +parameter), the [EndBlocker](./README.md#endblocker-design) automatically resolves it: + +1. Discrepancy status becomes `TimedOut` +2. Both attestations remain voided (they are not reinstated) +3. Both auditors' bonds are unfrozen (no slash applied) +4. A `EventDiscrepancyTimedOut` event is emitted + +This prevents indefinite bond freezing when governance is slow to act. The voided attestations are not reinstated -- +both auditors must re-evaluate the provider and submit new attestations if they wish. + +#### Attestation Deposit (Anti-Griefing) + +To raise the economic cost of triggering frivolous discrepancies, auditors must post a small deposit alongside +each attestation submission. The deposit is separate from the audit fee. + +- **Deposit amount**: `attestation_deposit` (governance parameter), applied uniformly across all tiers. +- **Normal flow**: the deposit is returned to the auditor when the attestation reaches any terminal state + (Expired, Revoked, Removed, or replaced by a new attestation from the same auditor). +- **Discrepancy flow**: when a discrepancy is triggered, both auditors' deposits for the conflicting attestations + are locked. On resolution, the non-vindicated auditor's deposit is slashed (sent to community pool). The + vindicated auditor's deposit is returned. On auto-resolution timeout, both deposits are returned. +- The deposit is included in `MsgSubmitAttestation` as an additional `deposit` field. The module validates + the deposit meets the governance minimum. + +### Auditor Lifecycle + +#### Active Operation + +An Active auditor may: +- Submit attestations for any provider (within their max attestation tier) +- Revoke their own attestations +- Receive escrowed fees on attestation expiry + +#### Frozen + +An auditor whose bond is frozen (due to a discrepancy event) may not: +- Submit new attestations +- Unbond + +Existing non-conflicting attestations from a frozen auditor remain valid. A frozen auditor cannot replace their +voided attestation until governance resolves the discrepancy and unfreezes their bond. + +#### Resignation + +An auditor may voluntarily resign via `MsgResignAuditor`: +- Status becomes `Resigned` -- cannot issue new attestations +- Existing attestations remain valid until their natural TTL expiry +- Bond enters unbonding after all outstanding attestation escrows have resolved (expired, voided, or superseded) + +#### Renewal + +Governance must periodically re-approve every auditor. This is the same process as initial registration -- a +[governance proposal](./README.md#renewauditor) is submitted and voted on. The renewal period scales with the +auditor's maximum attestation authority: higher authority requires more frequent re-validation. + +| Max Attestation Authority | Renewal Period | +|---------------------------|----------------------------------------------------------------------| +| Level 3 (Identified) | `renewal_period_l3` (governance parameter, longest, e.g., 24 months) | +| Level 2 (Verified) | `renewal_period_l2` (governance parameter, e.g., 18 months) | +| Level 1 (Established) | `renewal_period_l1` (governance parameter, e.g., 12 months) | +| Level 0 (Trusted) | `renewal_period_l0` (governance parameter, shortest, e.g., 6 months) | + +On successful renewal (governance proposal passes), the auditor's `renewal_deadline` is reset to the current time plus +the applicable renewal period. + +If the renewal deadline passes without a successful governance proposal: +- Auditor status becomes `Lapsed` -- cannot issue new attestations +- Existing attestations remain valid until their natural TTL expiry (providers are not disrupted) +- Bond enters unbonding +- The auditor must go through the full governance registration process to reactivate (treated as a new registration) + +There is no grace period. The renewal deadline is known well in advance; the auditor and community have the entire +renewal period to prepare the governance proposal. + +#### Lapsed + +An auditor whose renewal deadline has passed without re-approval: +- Cannot issue new attestations +- Existing non-expired attestations remain valid until their TTL expiry +- Bond enters unbonding +- Must re-register via full governance proposal to become Active again + +#### Removal by Governance + +Governance may remove an auditor via a [`RemoveAuditor`](./README.md#removeauditor) proposal: +- Status becomes `Removed` -- cannot issue new attestations +- Existing attestations remain valid until their natural TTL expiry (providers are not immediately disrupted) +- Bond enters unbonding (governance-configurable duration) +- The auditor cannot re-register without a new governance proposal + +### Governance Actions + +All governance proposal types related to the verification tier system are defined here. All governance messages use +the `authority` field set to the governance module address and are submitted via the standard `x/gov` proposal flow. + +#### RegisterAuditor + +Adds a new auditor to the approved set. Includes auditor address, max attestation tier, organization metadata, and +required bond amount. On approval, the auditor must post their [bond](./README.md#auditor-bond) to become Active. +Sets the initial `renewal_deadline` to the current time plus the applicable +[renewal period](./README.md#renewal). + +#### RenewAuditor + +Re-approves an existing auditor for a new [renewal period](./README.md#renewal). Same proposal and voting process +as initial registration. On approval, resets the auditor's `renewal_deadline`. The auditor's bond must remain posted. + +#### RemoveAuditor + +Removes an auditor from the approved set. Auditor status becomes `Removed`. Existing attestations remain valid until +their natural TTL expiry. Bond enters unbonding. + +#### RevokeProviderAttestation + +Surgically revokes a single attestation on a specific provider. + +``` +RevokeProviderAttestation { + authority: AccAddress + provider: AccAddress + auditor: AccAddress + reason: string +} +``` + +Effect: +- The specific attestation (provider + auditor pair) is voided (status: `Voided`, reason: `GovernanceAction`) +- Escrowed fee returned to provider +- No automatic action against the auditor + +#### RevokeAllProviderAttestations + +Revokes all valid attestations on a specific provider. This is the nuclear option for a provider discovered to be +fraudulent or compromised. + +``` +RevokeAllProviderAttestations { + authority: AccAddress + provider: AccAddress + reason: string +} +``` + +Effect: +- All valid attestations on this provider are voided (status: `Voided`, reason: `GovernanceAction`) +- All escrowed fees for those attestations returned to provider +- Provider is effectively Level 4 (Permissionless) +- No automatic action against any of the auditors involved + +#### RevokeAuditorAttestations + +Revokes all outstanding attestations issued by a specific auditor across all providers. Used when an auditor is found +to have been issuing fraudulent or negligent attestations. + +``` +RevokeAuditorAttestations { + authority: AccAddress + auditor: AccAddress + reason: string +} +``` + +Effect: +- All valid attestations issued by this auditor (across all providers) are voided (status: `Voided`, reason: `GovernanceAction`) +- All escrowed fees for those attestations returned to the respective providers +- **Auditor bond is fully slashed** +- Auditor status is not automatically changed to `Removed` -- this proposal can be combined with a separate + [`RemoveAuditor`](./README.md#removeauditor) proposal if governance also wants to formally remove them. However, + with a fully slashed bond the auditor is effectively unable to operate until they re-post their bond. + +This proposal is independent of `RemoveAuditor`. It allows governance to invalidate an auditor's past work without +necessarily preventing future work (though the full bond slash makes continuation impractical without re-bonding). + +#### ResolveDiscrepancy + +Resolves a pending [discrepancy event](./README.md#cross-validation) triggered by the cross-validation rule. + +``` +ResolveDiscrepancy { + authority: AccAddress + discrepancy_id: uint64 + vindicated_auditor: AccAddress (empty if neither vindicated) + slash_auditor_a: bool + slash_auditor_b: bool + reason: string +} +``` + +Effect: +- Discrepancy event status becomes `Resolved` +- Vindicated auditor's bond is unfrozen (only if all their discrepancies are resolved) +- Non-vindicated auditor's bond is slashed (fully) if the corresponding slash flag is set +- Both auditors' statuses are updated accordingly (unfrozen for vindicated, unchanged, or removed for non-vindicated) + +#### SlashProviderBond + +Slashes a provider's [economic bond](./README.md#provider-economic-bond). Used when a provider is found to have +misrepresented resources. + +``` +SlashProviderBond { + authority: AccAddress + provider: AccAddress + slash_fraction: Dec (0.0 to 1.0) + reason: string +} +``` + +Effect: +- Provider's `bonded_amount` is reduced by `bonded_amount * slash_fraction` +- Slashed tokens are sent to the community pool +- Provider's `slashed` flag is set, `last_slash_time` is recorded +- If remaining bond falls below the minimum required by any active attestation tier, those attestations are voided + (status: `Voided`, reason: `BondSlashed`) +- An `EventProviderBondSlashed` event is emitted + +#### UpdateParams + +Updates [governance parameters](./README.md#governance-parameters) for the `x/verification` module. + +``` +UpdateParams { + authority: AccAddress + params: Params +} +``` + +#### Governance Proposal Summary + +| Proposal | Target | Attestation Effect | Bond Effect | Auditor Status Effect | +|------------------------------------------------------------------------------|--------------------|----------------------------------|--------------------------|--------------------------------------| +| [`RegisterAuditor`](./README.md#registerauditor) | Auditor | -- | Requires posting | Creates `Active` auditor | +| [`RenewAuditor`](./README.md#renewauditor) | Auditor | -- | Must remain posted | Resets renewal deadline | +| [`RemoveAuditor`](./README.md#removeauditor) | Auditor | Existing valid until TTL | Enters unbonding | `Removed` | +| [`RevokeAuditorAttestations`](./README.md#revokeauditorattestations) | Auditor | All voided immediately | Fully slashed | Unchanged (but effectively inactive) | +| [`RevokeProviderAttestation`](./README.md#revokeproviderattestation) | Provider + Auditor | Single attestation voided | No effect | No effect | +| [`RevokeAllProviderAttestations`](./README.md#revokeallproviderattestations) | Provider | All attestations voided | No effect | No effect | +| [`ResolveDiscrepancy`](./README.md#resolvediscrepancy) | Discrepancy event | -- | Slash/unfreeze per flags | Unfreeze vindicated | +| [`SlashProviderBond`](./README.md#slashproviderbond) | Provider | Void if bond drops below minimum | Provider bond slashed | No effect | +| [`UpdateParams`](./README.md#updateparams) | Module | -- | -- | -- | + +### Provider Economic Bond + +Separate from the [auditor bond](./README.md#auditor-bond), providers must post an AKT bond to qualify for Level 2 +and above. The bond serves as economic collateral against resource misrepresentation and is slashable via +[governance action](./README.md#slashproviderbond). + +#### Bond Calculation + +The required bond is calculated from the provider's declared resource capacity (from their most recent on-chain +`ResourceSummary` submitted via [`MsgPostSnapshotHash`](./README.md#posting-mechanism)): + +``` +required_bond = sum( + gpu_count * bond_per_gpu[tier], + vcpu_count * bond_per_vcpu[tier], + memory_gb * bond_per_memory_gb[tier], + storage_tb * bond_per_storage_tb[tier] +) +``` + +Each `bond_per_*[tier]` is a [governance parameter](./README.md#governance-parameters). Higher tiers require higher +per-unit bonds: + +| Resource | L2 (Verified) | L1 (Established) | L0 (Trusted) | +|---|---|---|---| +| Per GPU | `bond_gpu_l2` | `bond_gpu_l1` (>= 2x L2) | `bond_gpu_l0` (>= 4x L2) | +| Per vCPU | `bond_vcpu_l2` | `bond_vcpu_l1` | `bond_vcpu_l0` | +| Per GB memory | `bond_mem_gb_l2` | `bond_mem_gb_l1` | `bond_mem_gb_l0` | +| Per TB storage | `bond_storage_tb_l2` | `bond_storage_tb_l1` | `bond_storage_tb_l0` | + +#### Bond Messages + +**`MsgPostProviderBond`** -- Provider deposits AKT into the verification module's bond escrow. + +``` +MsgPostProviderBond { + provider: AccAddress (signer) + amount: Coin (AKT) +} +``` + +On submission: +1. Validate the provider is registered on-chain +2. Transfer AKT from provider to the module's bond escrow account +3. Update the provider's [`ProviderBondRecord`](./README.md#provider-bond-record) (add to `bonded_amount`) +4. If the provider already has a bond, the new amount is added to the existing bond +5. Emit `EventProviderBondPosted` + +**`MsgWithdrawProviderBond`** -- Provider initiates unbonding of part or all of their bond. + +``` +MsgWithdrawProviderBond { + provider: AccAddress (signer) + amount: Coin (AKT amount to withdraw) +} +``` + +On submission: +1. Validate the requested amount does not exceed the current bonded amount +2. Calculate the remaining bond after withdrawal: `remaining = bonded_amount - amount` +3. If the remaining bond is less than the minimum required by the provider's highest active attestation tier, + void all attestations at tiers that can no longer be supported (status: `Voided`, reason: `BondWithdrawn`), + and return their escrowed fees to the provider +4. Reduce `bonded_amount` by the withdrawal amount +5. Add an `UnbondingEntry` with `completion_time = block_time + provider_bond_unbonding_period` +6. Add an entry to the provider bond unbonding queue for [EndBlocker](./README.md#endblocker-design) processing +7. After unbonding completes (processed by EndBlocker), transfer AKT back to provider + +#### Resource Declaration + +When posting a bond, the provider's resource capacity is determined from the latest on-chain `ResourceSummary` +(submitted via [`MsgPostSnapshotHash`](./README.md#posting-mechanism)). If no snapshot hash exists, the provider must +submit one first. This creates a natural dependency chain: +`MsgPostSnapshotHash` -> `MsgPostProviderBond` -> `MsgSubmitAttestation` at L2+. + +### On-Chain State + +#### Module Identity + +This specification introduces a new Cosmos SDK module: `x/verification` with proto package `akash.verification.v1`. +This module is separate from the existing `x/audit` module and runs alongside it during the +[migration period](./README.md#migration-from-aep-9). + +#### Auditor Record + +``` +AuditorRecord { + address: AccAddress + status: Active | Frozen | Lapsed | Resigned | Removed + max_attestation_tier: VerificationTier (TierTrusted through TierIdentified) + bond_amount: Coin + bond_status: Bonded | Frozen | Unbonding + metadata_hash: bytes + registered_at: Timestamp + renewal_deadline: Timestamp + discrepancy_count: uint +} +``` + +#### Attestation Record + +> **Note**: Attestation records exist only for tiers L0-L3 (TierTrusted through TierIdentified). Level 4 +> (Permissionless) is the implicit state of any provider without a valid attestation -- there is no L4 attestation +> record. In query responses, `TierUnspecified` (enum value 0) represents "no attestation" (effectively L4). + +``` +AttestationRecord { + provider: AccAddress + auditor: AccAddress + tier: VerificationTier (TierTrusted through TierIdentified) + capabilities: []CapabilityFlag + evidence_hash: bytes + fee: Coin + fee_status: Escrowed | ReleasedToAuditor | ReturnedToProvider + created_at: Timestamp + expires_at: Timestamp + status: Valid | Voided | Expired | Revoked | Removed + voided_reason: null | Discrepancy | Governance | BondWithdrawn | BondSlashed +} +``` + +#### Discrepancy Event + +``` +DiscrepancyEvent { + id: uint64 + provider: AccAddress + auditor_a: AccAddress + auditor_a_tier: uint + auditor_b: AccAddress + auditor_b_tier: uint + timestamp: Timestamp + resolution_status: Pending | Resolved + resolution_proposal_id: uint64 (nullable) +} +``` + +#### Provider Bond Record + +``` +ProviderBondRecord { + provider: AccAddress + bonded_amount: Coin + unbonding_entries: []UnbondingEntry + slashed: bool + last_slash_time: Timestamp (nullable) +} + +UnbondingEntry { + amount: Coin + completion_time: Timestamp +} +``` + +#### Provider Snapshot Record + +``` +ProviderSnapshotRecord { + provider: AccAddress + snapshot_hash: bytes + resource_summary: ResourceSummary + posted_at: Timestamp + snapshot_timestamp: Timestamp + compliance_deadline: Timestamp + suspended: bool +} + +ResourceSummary { + total_gpus: uint32 + total_vcpus: uint32 + total_memory_mb: uint64 + total_storage_mb: uint64 + active_leases: uint32 + software_version: string + software_signature: bytes +} +``` + +### Governance Parameters + +See [Initial Governance Parameter Values](./README.md#initial-governance-parameter-values) for suggested genesis values. + +| Parameter | Description | +|--------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------| +| `bond_l3` | Minimum auditor bond for Level 3 attestation authority | +| `bond_l2` | Minimum auditor bond for Level 2 attestation authority | +| `bond_l1` | Minimum auditor bond for Level 1 attestation authority | +| `bond_l0` | Minimum auditor bond for Level 0 attestation authority | +| `ttl_l3` | Attestation TTL for Level 3 | +| `ttl_l2` | Attestation TTL for Level 2 | +| `ttl_l1` | Attestation TTL for Level 1 | +| `ttl_l0` | Attestation TTL for Level 0 | +| `min_fee_l3` | Minimum audit fee for Level 3 attestation | +| `min_fee_l2` | Minimum audit fee for Level 2 attestation | +| `min_fee_l1` | Minimum audit fee for Level 1 attestation | +| `min_fee_l0` | Minimum audit fee for Level 0 attestation | +| `discrepancy_threshold` | Maximum tier level difference before [cross-validation](./README.md#cross-validation) triggers (default: 1) | +| `auditor_unbonding_period` | Duration of auditor bond unbonding | +| `renewal_period_l3` | Auditor [renewal](./README.md#renewal) period for Level 3 max authority | +| `renewal_period_l2` | Auditor renewal period for Level 2 max authority | +| `renewal_period_l1` | Auditor renewal period for Level 1 max authority | +| `renewal_period_l0` | Auditor renewal period for Level 0 max authority | +| `snapshot_hash_interval` | Maximum time between required [snapshot hash](./README.md#on-chain-snapshot-hashes) postings | +| `max_snapshot_age` | Maximum age of snapshot timestamp relative to block time | +| `bond_gpu_l2` / `bond_gpu_l1` / `bond_gpu_l0` | [Provider bond](./README.md#provider-economic-bond) per GPU at each tier | +| `bond_vcpu_l2` / `bond_vcpu_l1` / `bond_vcpu_l0` | Provider bond per vCPU at each tier | +| `bond_mem_gb_l2` / `bond_mem_gb_l1` / `bond_mem_gb_l0` | Provider bond per GB memory at each tier | +| `bond_storage_tb_l2` / `bond_storage_tb_l1` / `bond_storage_tb_l0` | Provider bond per TB storage at each tier | +| `provider_bond_unbonding_period` | Duration of provider bond unbonding | +| `min_age_l2` | Minimum provider registration age for Level 2 [on-chain enforcement](./README.md#on-chain-prerequisite-enforcement) | +| `min_age_l1` | Minimum provider registration age for Level 1 | +| `min_age_l0` | Minimum provider registration age for Level 0 | +| `min_lease_completion_bps_l1` | Minimum lease completion rate (basis points) for Level 1 | +| `min_lease_completion_bps_l0` | Minimum lease completion rate (basis points) for Level 0 | +| `clean_history_window_l1` | Lookback window for clean slashing history (Level 1) | +| `clean_history_window_l0` | Lookback window for clean slashing history (Level 0) | +| `min_l1_duration_for_l0` | Minimum continuous Level 1+ attestation duration before Level 0 | +| `min_leases_for_completion_rate` | Minimum lease count before completion rate is enforced | +| `contact_response_critical_l3` | Maximum response time for critical incidents at Level 3 | +| `contact_response_critical_l2` | Maximum response time for critical incidents at Level 2 | +| `contact_response_critical_l1` | Maximum response time for critical incidents at Level 1 | +| `contact_response_critical_l0` | Maximum response time for critical incidents at Level 0 | +| `contact_response_standard_l3` | Maximum response time for standard inquiries at Level 3 | +| `contact_response_standard_l2` | Maximum response time for standard inquiries at Level 2 | +| `contact_response_standard_l1` | Maximum response time for standard inquiries at Level 1 | +| `contact_response_standard_l0` | Maximum response time for standard inquiries at Level 0 | +| `max_endblocker_attestation_expiries` | Maximum attestation expiries processed per block by [EndBlocker](./README.md#endblocker-design) | +| `max_endblocker_snapshot_suspensions` | Maximum snapshot suspensions processed per block | +| `max_endblocker_unbonding_completions` | Maximum bond unbonding completions processed per block | +| `max_endblocker_discrepancy_timeouts` | Maximum discrepancy auto-resolutions processed per block | +| `discrepancy_resolution_timeout` | Duration before unresolved [discrepancies](./README.md#discrepancy-auto-resolution-timeout) auto-resolve | +| `attestation_deposit` | Required [deposit](./README.md#attestation-deposit-anti-griefing) per attestation submission (anti-griefing) | +| `verification_module_active` | Feature flag for [migration](./README.md#migration-from-aep-9): when `true`, market uses `x/verification` for bids | + +### Initial Governance Parameter Values + +| Parameter | Value | Rationale | +|---|---|---| +| `bond_l3` | 1,000 AKT | Low barrier for L3 auditors | +| `bond_l2` | 5,000 AKT | Moderate for L2 | +| `bond_l1` | 25,000 AKT | Significant for L1 | +| `bond_l0` | 100,000 AKT | Highest assurance | +| `ttl_l3` | 365 days | Annual re-audit | +| `ttl_l2` | 180 days | Semi-annual | +| `ttl_l1` | 90 days | Quarterly | +| `ttl_l0` | 90 days | Quarterly | +| `min_fee_l3` | 10 AKT | Nominal | +| `min_fee_l2` | 50 AKT | Covers automated audit cost | +| `min_fee_l1` | 200 AKT | Covers deeper evaluation | +| `min_fee_l0` | 1,000 AKT | Covers on-site audit | +| `discrepancy_threshold` | 1 | Trigger on >1 tier difference | +| `auditor_unbonding_period` | 21 days | Standard Cosmos unbonding | +| `renewal_period_l3` | 24 months | Longest renewal cycle | +| `renewal_period_l2` | 18 months | | +| `renewal_period_l1` | 12 months | | +| `renewal_period_l0` | 6 months | Most frequent re-approval | +| `snapshot_hash_interval` | 24 hours | Daily freshness | +| `max_snapshot_age` | 1 hour | Snapshot must be recent | +| `bond_gpu_l2` / `l1` / `l0` | 50 / 100 / 200 AKT | Per GPU | +| `bond_vcpu_l2` / `l1` / `l0` | 0.5 / 1 / 2 AKT | Per vCPU | +| `bond_mem_gb_l2` / `l1` / `l0` | 0.25 / 0.5 / 1 AKT | Per GB memory | +| `bond_storage_tb_l2` / `l1` / `l0` | 2 / 4 / 8 AKT | Per TB storage | +| `provider_bond_unbonding_period` | 21 days | Standard unbonding | +| `min_age_l2` | 30 days | | +| `min_age_l1` | 120 days | | +| `min_age_l0` | 300 days | | +| `min_lease_completion_bps_l1` / `l0` | 9800 (98%) | High completion threshold | +| `clean_history_window_l1` | 90 days | | +| `clean_history_window_l0` | 180 days | | +| `min_l1_duration_for_l0` | 180 days | 6 months at L1 before L0 | +| `min_leases_for_completion_rate` | 10 | Avoids 1/1 = 100% gaming | +| `contact_response_critical_l3` | 72 hours | Lenient; proves channel is monitored | +| `contact_response_critical_l2` | 24 hours | Business-day response | +| `contact_response_critical_l1` | 4 hours | Near-real-time for established operators | +| `contact_response_critical_l0` | 1 hour | Highest-trust providers must be highly responsive | +| `contact_response_standard_l3` | 7 days | Generous window for non-urgent inquiries | +| `contact_response_standard_l2` | 72 hours | | +| `contact_response_standard_l1` | 24 hours | | +| `contact_response_standard_l0` | 4 hours | | +| `max_endblocker_attestation_expiries` | 100 | Per-block processing cap | +| `max_endblocker_snapshot_suspensions` | 50 | Per-block processing cap | +| `max_endblocker_unbonding_completions` | 50 | Per-block processing cap | +| `max_endblocker_discrepancy_timeouts` | 10 | Per-block processing cap | +| `discrepancy_resolution_timeout` | 90 days | Prevents indefinite bond freezing | +| `attestation_deposit` | 100 AKT | Anti-griefing; returned on normal expiry, slashed on losing discrepancy | +| `verification_module_active` | `false` | Off during Phase 1; governance sets to `true` for Phase 2 | + +### Store Layout and Indexing + +The `x/verification` module uses the following KV store key layout. All keys use a single-byte prefix for namespace +separation. Composite keys use fixed-width components (20-byte addresses, 8-byte big-endian timestamps/IDs) to +enable efficient prefix iteration and range scans. + +#### Primary Records + +``` +0x01 | auditor_addr (20 bytes) -> AuditorRecord +0x02 | provider_addr (20 bytes) | auditor_addr (20 bytes) -> AttestationRecord +0x03 | discrepancy_id (8 bytes BE) -> DiscrepancyEvent +0x04 | provider_addr (20 bytes) -> ProviderBondRecord +0x05 | provider_addr (20 bytes) -> ProviderSnapshotRecord +0x06 -> Params +0x07 -> next_discrepancy_id (uint64) +``` + +#### Secondary Indexes + +``` +0x10 | auditor_addr (20 bytes) | provider_addr (20 bytes) -> [] (empty value; existence index) +``` + +This index enables efficient iteration of all attestations issued by a specific auditor, which is needed for +[`RevokeAuditorAttestations`](./README.md#revokeauditorattestations) and `QueryAuditorAttestations`. + +#### Time-Indexed Queues + +All queues use big-endian Unix timestamp (8 bytes) as the leading key component after the prefix, enabling +efficient range iteration from the start of the queue up to the current block time. + +``` +0x20 | expires_at (8B BE) | provider_addr | auditor_addr -> [] (attestation expiry queue) +0x21 | deadline (8B BE) | auditor_addr -> [] (auditor renewal deadline queue) +0x22 | deadline (8B BE) | provider_addr -> [] (snapshot compliance queue) +0x23 | completion (8B BE) | provider_addr | seq (4B) -> [] (provider bond unbonding queue) +0x24 | completion (8B BE) | auditor_addr -> [] (auditor bond unbonding queue) +0x25 | timeout (8B BE) | discrepancy_id (8B BE) -> [] (discrepancy auto-resolution queue) +``` + +#### Queue Processing Pattern + +[EndBlocker](./README.md#endblocker-design) iterates each queue from `prefix` to `prefix | block_time_bytes` +(inclusive), processing and deleting entries. Per-block processing caps +([governance parameters](./README.md#governance-parameters)) prevent unbounded gas usage. If the cap is reached, +remaining entries are processed in the next block. + +### EndBlocker Design + +The `x/verification` module's EndBlocker processes time-dependent state transitions every block. Queues are processed +in the following order to ensure correct state dependencies. + +#### 1. Attestation Expiry Queue (prefix `0x20`) + +``` +for each entry where expires_at <= block_time (up to max_endblocker_attestation_expiries): + 1. Load AttestationRecord by (provider, auditor) from prefix 0x02 + 2. If status == Valid: + a. Set status = Expired + b. Set fee_status = ReleasedToAuditor + c. Transfer escrowed fee from module account to auditor + d. Emit EventAttestationExpired + e. Emit EventFeeReleasedToAuditor + 3. Remove the secondary index entry (prefix 0x10) + 4. Delete queue entry +``` + +#### 2. Auditor Renewal Queue (prefix `0x21`) + +``` +for each entry where deadline <= block_time: + 1. Load AuditorRecord from prefix 0x01 + 2. If status == Active: + a. Set status = Lapsed + b. Set bond_status = Unbonding + c. Add entry to auditor bond unbonding queue (prefix 0x24) + with completion_time = block_time + auditor_unbonding_period + d. Emit EventAuditorLapsed + 3. Delete queue entry +``` + +#### 3. Snapshot Compliance Queue (prefix `0x22`) + +``` +for each entry where deadline <= block_time (up to max_endblocker_snapshot_suspensions): + 1. Load ProviderSnapshotRecord from prefix 0x05 + 2. If not already suspended: + a. Set suspended = true + b. Emit EventSnapshotSuspended + 3. Delete queue entry + // No new queue entry is created. Provider must post a new MsgPostSnapshotHash + // to re-enter the compliance cycle. +``` + +#### 4. Provider Bond Unbonding Queue (prefix `0x23`) + +``` +for each entry where completion_time <= block_time (up to max_endblocker_unbonding_completions): + 1. Load ProviderBondRecord from prefix 0x04 + 2. Find matching UnbondingEntry by completion_time, remove it from the entries list + 3. Transfer the unbonded amount from module escrow to provider + 4. Store updated ProviderBondRecord + 5. Delete queue entry +``` + +#### 5. Auditor Bond Unbonding Queue (prefix `0x24`) + +``` +for each entry where completion_time <= block_time: + 1. Load AuditorRecord from prefix 0x01 + 2. Transfer bond from module escrow to auditor + 3. Set bond_amount = 0, bond_status = Unbonding (completed) + 4. Store updated AuditorRecord + 5. Delete queue entry +``` + +#### 6. Discrepancy Auto-Resolution Queue (prefix `0x25`) + +``` +for each entry where timeout <= block_time (up to max_endblocker_discrepancy_timeouts): + 1. Load DiscrepancyEvent from prefix 0x03 + 2. If resolution_status == Pending: + a. Set resolution_status = TimedOut + b. Unfreeze auditor_a's bond (if no other pending discrepancies remain) + c. Unfreeze auditor_b's bond (if no other pending discrepancies remain) + d. Return attestation deposits for both voided attestations to respective auditors + e. Emit EventDiscrepancyTimedOut + 3. Delete queue entry +``` + +### Protobuf Definitions + +Complete protobuf definitions for the `akash.verification.v1` package (8 proto files: types, state, params, +msg, service, query, events, genesis) are provided in the +[Implementation Guide](./IMPLEMENTATION.md#4-protobuf-definitions). + +### Market Module Integration + +Tenants specify minimum verification requirements in their deployment SDL. The `x/market` module enforces these +requirements during bid creation by querying the `x/verification` module. + +#### VerificationKeeper Interface + +The `x/market` module replaces its existing `AuditKeeper` dependency with a `VerificationKeeper` interface: + +``` +VerificationKeeper: + GetProviderValidAttestations(ctx, provider) -> ([]AttestationRecord, bool) + IsProviderSnapshotCompliant(ctx, provider) -> bool + GetProviderBestTier(ctx, provider) -> VerificationTier + ProviderHasCapability(ctx, provider, CapabilityFlag) -> bool +``` + +- `GetProviderValidAttestations` returns all attestations with status `Valid` that have not expired. This method + does NOT filter by snapshot compliance -- the caller must check `IsProviderSnapshotCompliant` separately. This + separation ensures the verification module reports facts while the market module applies bid-time policy. +- `IsProviderSnapshotCompliant` returns `false` if the provider's `ProviderSnapshotRecord.suspended` is `true`. +- `GetProviderBestTier` returns the best (lowest numeric enum value = highest trust) tier from valid attestations. + Returns `TierUnspecified` if no valid attestations exist (effectively Level 4). +- `ProviderHasCapability` returns `true` if any valid attestation for this provider includes the given + [capability flag](./README.md#capability-flags). + +#### Bid Filtering + +The `CreateBid` handler in `x/market` is updated to check verification requirements: + +``` +1. If the order has a VerificationRequirement with min_tier set: + a. Check snapshot compliance: if provider is snapshot-suspended and min_tier <= L2, + reject with ErrProviderSnapshotSuspended + b. Check tier: if provider's best tier > min_tier (numerically higher = lower trust), + reject with ErrInsufficientVerificationTier + c. Check capabilities: for each required capability, if provider does not have it, + reject with ErrMissingCapability + d. Check specific auditors: if required_auditors is non-empty, at least one valid + attestation must be from a listed auditor, otherwise reject with + ErrRequiredAuditorNotFound +``` + +#### SDL Syntax + +Tenants specify verification requirements in the `placement` section of their SDL: + +```yaml +profiles: + placement: + dc1: + attributes: + region: us-west + verification: + min_tier: 2 + capabilities: + - tee_hardware_attestation + - confidential_computing + auditors: + - akash1auditor1... + pricing: + web: + denom: uakt + amount: 1000 +``` + +**On-chain representation**: The `PlacementRequirements` message (in the deployment module) gains a new +`VerificationRequirement` field containing `min_tier`, `required_capabilities`, and `required_auditors`. +If omitted or `min_tier` is unspecified, no verification filtering is applied (backward compatible with +existing deployments). See [Implementation Guide](./IMPLEMENTATION.md#35-verificationrequirement-xdeployment-proto-addition) +for the full protobuf definition. + +Console applies default filtering policies (e.g., show only providers with at least one Level 2 attestation by default). + +### Incentive Integration + +The on-chain incentive module (AEP-53) consumes attestation data to determine provider eligibility and incentive +multipliers. The specific eligibility rules are defined by AEP-53 and consume the raw attestation records stored by +this module via the [VerificationKeeper interface](./README.md#verificationkeeper-interface). + +### Migration from AEP-9 + +The existing `x/audit` module (AEP-9) is deprecated and replaced by the new `x/verification` module through a +phased migration. + +#### Phase 1: Parallel Introduction (chain upgrade N) + +- Deploy `x/verification` module alongside `x/audit` +- `x/audit` continues to operate normally for existing deployments +- New deployments can use `verification` requirements in SDL (opt-in via the new `VerificationRequirement` field) +- Market module accepts both `AttributesFilters` (legacy) and `VerificationRequirement` (new) +- A governance parameter `verification_module_active` (default: `false`) gates whether `x/verification` is + checked during bid creation. When `false`, verification requirements in SDL are accepted but not enforced, + allowing testing. +- No existing data is migrated -- providers must undergo the new verification process from scratch + +#### Phase 2: Verification Primary (chain upgrade N+1, approximately 3-6 months after Phase 1) + +- `verification_module_active` set to `true` via governance +- Console defaults to showing verification tiers +- New deployments default to verification requirements +- `x/audit` is in maintenance mode (no new features) +- Existing deployments and open orders created with `x/audit` attribute requirements continue to use `x/audit` + matching for bid filtering. The market module checks both `AttributesFilters` (legacy, for orders created before + activation) and `VerificationRequirement` (new). An order may contain either or both. + +#### Phase 3: Audit Deprecation (chain upgrade N+2, approximately 6-12 months after Phase 1) + +- `x/audit` service endpoints return deprecation warnings +- `MsgSignProviderAttributes` and `MsgDeleteProviderAttributes` are disabled (return error) +- Existing audit-based deployments continue to match but new ones cannot be created with audit-only requirements +- All Console and SDK flows use `x/verification` exclusively + +#### Phase 4: Audit Removal (chain upgrade N+3) + +- `x/audit` module is removed from the app +- Audit store is pruned +- Market module removes legacy `AuditKeeper` interface +- `AttributesFilters` field in `PlacementRequirements` is deprecated + +#### State Migration + +No automatic state migration from `x/audit` to `x/verification`. The systems are fundamentally different (key-value +attributes vs. tiered attestations). Providers must go through the new verification process. diff --git a/src/content/aeps/aep-9/README.md b/src/content/aeps/aep-9/README.md index e1a1c5313..0d0783af6 100644 --- a/src/content/aeps/aep-9/README.md +++ b/src/content/aeps/aep-9/README.md @@ -1,7 +1,7 @@ --- aep: 9 title: Trusted Providers -author: Greg Osuri (@gosuri), Adam Bozanich (@boz) +author: Greg Osuri (@gosuri) Adam Bozanich (@boz) status: Final type: Standard category: Core