Skip to content

Add synchronous shell-exec endpoint for API-token access #3079

@jobordu

Description

@jobordu

Why

API consumers (CI/CD pipelines, scripts, SDK users) cannot execute commands inside deployment containers via the Console API. Shell access currently requires a direct WebSocket connection through provider-proxy with provider-level credentials (mTLS/JWT). This forces automation workflows to manage WebSocket sessions and provider certificates manually.

Adding an HTTP exec endpoint to apps/api would let API-token holders run one-shot commands and receive structured output.

User story: As a DevOps engineer with a Console API key, I want to execute a command in my deployment container via a REST call, so that I can automate health checks and maintenance without managing WebSocket connections or provider credentials.

Key use case: post-deploy secret injection

A primary motivation is enabling secrets to be injected after deployment rather than baked into the SDL. Today, secrets (DB passwords, API keys, TLS certificates, service mesh tokens) must be embedded in the SDL at deploy time, which means they are written to the Akash on-chain record.

With shell-exec, containers can start in a "waiting for init" state and receive secrets only after the deployment is live:

Deploy container (no secrets in SDL)
    ↓
Container starts → waits for secrets (polling /run/secrets/ or similar)
    ↓
shell-exec pushes secrets into the container
    ↓
Container detects secrets → proceeds with full initialization

Example container entrypoint pattern:

#!/bin/sh
until [ -f /run/secrets/db_password ] && [ -f /run/secrets/api_key ]; do
  echo "Waiting for secrets..."
  sleep 2
done
exec myapp

This pattern also applies to infrastructure-level secrets (e.g., service mesh gossip keys, ACL bootstrap tokens, TLS certificates) that would otherwise require a full close + redeploy cycle to rotate. The wait loop should include a hard timeout (e.g., 120s) to surface failures cleanly.

What

Proposed endpoint

POST /v1/deployments/{dseq}/leases/{gseq}/{oseq}/shell-exec

Request body:

{
  "command": "string",
  "service": "string",
  "timeout": 60
}

Response body:

{
  "stdout": "string",
  "stderr": "string",
  "exitCode": 0,
  "truncated": false
}

Auth: SECURITY_BEARER_OR_API_KEY (existing pattern).

Secret injection coverage

Scenario Without shell-exec With shell-exec
DB credentials / API keys ❌ Baked into SDL (on-chain) ✅ Injected post-deploy via file write
TLS certificate rotation ❌ Requires redeploy ✅ Write cert + trigger reload signal
Service mesh tokens (e.g., Consul ACL) ❌ Requires redeploy ✅ Init pattern: container waits, shell-exec pushes
Image pull credentials ❌ Requires redeploy ❌ Still requires redeploy
Initial env vars (process-level) ❌ Deploy-time only ❌ Cannot update live env vars

Acceptance criteria

  • Given a valid API key and owned deployment, when the endpoint is called with a command and service name, then stdout, stderr, and exitCode are returned in the response body
  • Given an unauthenticated request, the endpoint returns 401
  • Given a deployment the user does not own, the endpoint returns 403
  • Given a command that exceeds the timeout (default 60s), the connection is closed and a timeout error is returned
  • Given output exceeding 1MB, the response is truncated and includes a truncated: true flag
  • The endpoint is documented in the OpenAPI spec with request/response schemas

Implementation approach

  1. Blocking open question: Verify provider shell endpoint works with non-interactive mode (stdin=0, tty=0) and determine required JWT scope
  2. Add ShellExecService that opens a WebSocket to provider-proxy, sends the command, collects output frames, and returns structured result
  3. Extend ProviderJwtTokenService to generate tokens with shell-exec scope (if providers enforce scopes)
  4. Create POST route with Zod request/response schemas and OpenAPI documentation
  5. Add ownership validation via existing CASL ability checks on the deployment
  6. Write unit tests for ShellExecService and functional tests for the route

Size estimate

Medium — API auth, provider-proxy communication, and route patterns all exist. The novel part is the WebSocket-to-HTTP bridge in ShellExecService, but downloadFileFromShell in the frontend already demonstrates the exact pattern.

Not doing (in this issue)

  • Interactive shell (WebSocket upgrade on API)
  • Streaming output (SSE)
  • File upload/download through this endpoint
  • Concurrent command execution
  • Shell session persistence
  • Provider-side changes
  • Updating live process environment variables (requires redeploy)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions