Skip to content

FeroVolar/pss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Podman Service & Scheduler (PSS)

Lightweight container orchestration built on Podman. Provides Kubernetes-like abstractions — Workloads (ReplicaSets), Services, and Autoscaling — for small multi-node environments without the complexity of a full Kubernetes deployment.

Architecture

┌──────────┐       gRPC        ┌─────────────────┐       gRPC        ┌───────────┐
│  pssctl   │ ───────────────▶ │  pss-controller  │ ◀──────────────── │ pss-agent │
│  (CLI)    │                  │  (control plane)  │ ──────────────▶  │ (per node) │
└──────────┘                  └─────────────────┘                  └───────────┘
                                       │                                  │
                                       │ gRPC stream                      │ Podman socket
                                       ▼                                  ▼
                                ┌────────────┐                     ┌──────────┐
                                │  pss-proxy  │ ──── TCP forward ──▶│  Pods    │
                                │  (L4 LB)   │                     └──────────┘
                                └────────────┘
Component Description
pss-controller Central control plane — stores desired state in SQLite, runs scheduler, reconciler, autoscaler, service manager
pss-agent Per-node daemon — manages pod lifecycle via Podman socket, collects and reports metrics
pss-proxy Per-service L4 load balancer — round-robin TCP forwarding to healthy backend pods
pssctl Operator CLI — apply YAML configs, query state, manual scaling

Prerequisites

  • Go 1.22+
  • Podman (rootless mode by default) installed on each agent node
  • protoc + protoc-gen-go + protoc-gen-go-grpc (only if regenerating protobuf code)
  • GCC / C compiler (required by go-sqlite3 CGo dependency)

Quick Start

1. Clone and build

git clone https://github.com/FeroVolar/pss.git
cd pss

# Enable CGo for SQLite
export CGO_ENABLED=1

# Build all binaries
go build -o pss-controller ./cmd/pss-controller
go build -o pss-agent      ./cmd/pss-agent
go build -o pss-proxy      ./cmd/pss-proxy
go build -o pssctl          ./cmd/pssctl

2. Run tests

go test ./...

All 67 tests should pass, including 27 property-based tests (100 iterations each via rapid).

3. Start the controller

export PSS_TOKEN="my-secret-token"

./pss-controller \
  --listen :50051 \
  --db pss.db \
  --token "$PSS_TOKEN"
Flag Env Var Default Description
--listen PSS_LISTEN :50051 gRPC listen address
--db PSS_DB pss.db SQLite database path
--token PSS_TOKEN (required) Shared auth token
--reconcile-interval 10s Reconciler loop interval
--metrics-interval 15s Health check / metrics interval

4. Start an agent (on each worker node)

./pss-agent \
  --controller localhost:50051 \
  --token "$PSS_TOKEN" \
  --listen :50052
Flag Env Var Default Description
--controller PSS_CONTROLLER localhost:50051 Controller gRPC address
--token PSS_TOKEN (required) Shared auth token
--listen PSS_AGENT_LISTEN :50052 Agent gRPC listen address (controller dials back here for RunPod/StopPod)
--socket (auto-detect) Podman socket path
--cpu NumCPU * 1000 CPU capacity in millicores
--mem 4096 Memory capacity in MiB
--metrics-interval 15s Metrics reporting interval

The agent auto-detects the Podman socket in this order:

  1. --socket flag (explicit path)
  2. DOCKER_HOST env var (e.g. unix:///var/folders/.../podman.sock)
  3. CONTAINER_HOST env var
  4. $XDG_RUNTIME_DIR/podman/podman.sock
  5. /run/user/{uid}/podman/podman.sock

On macOS with Podman Machine, set DOCKER_HOST before starting the agent:

export DOCKER_HOST='unix:///var/folders/8_/.../podman-machine-default-api.sock'
./pss-agent --controller localhost:50051 --token "$PSS_TOKEN"

The agent will:

  1. Register with the controller (hostname, CPU, memory, and its own gRPC address so the controller can dial back)
  2. Start reporting metrics every 15s
  3. Listen for RunPod/StopPod commands from the controller

Note: Only the agent needs access to the Podman socket. The controller and proxy do not interact with Podman directly.

5. Deploy a workload

./pssctl --token "$PSS_TOKEN" apply -f examples/my-app.yaml

See examples/my-app.yaml for the workload definition.

Workloads support port mappings via podSpec.ports. Each replica gets a dynamically allocated host port (Podman picks a free port), so multiple replicas can run on the same node without conflicts:

podSpec:
  containers:
    - name: web
      image: nginx:latest
  ports:
    - containerPort: 80   # Podman auto-assigns a random host port per replica

6. Create a service

./pssctl --token "$PSS_TOKEN" apply -f examples/my-service.yaml

See examples/my-service.yaml for the service definition.

The service manager automatically discovers each replica's dynamically mapped host port and creates endpoints accordingly. The proxy then load-balances across all healthy endpoints.

7. Start the proxy

./pss-proxy \
  --controller localhost:50051 \
  --token "$PSS_TOKEN" \
  --service my-service \
  --listen :8080
Flag Env Var Default Description
--controller PSS_CONTROLLER localhost:50051 Controller gRPC address
--token PSS_TOKEN (required) Shared auth token
--service (required) Service name to proxy
--listen PSS_LISTEN :8080 TCP listen address

Traffic to :8080 is now load-balanced across healthy pods of my-app.

8. Configure autoscaling

./pssctl --token "$PSS_TOKEN" \
  autoscale workload/my-app \
  --cpu 70 --mem 80 \
  --min 2 --max 10

Or via YAML:

./pssctl --token "$PSS_TOKEN" apply -f examples/my-autoscaler.yaml

See examples/my-autoscaler.yaml for the autoscaler definition.

CLI Reference

pssctl [global flags] <command> [args...]

Global Flags:
  --controller <addr>   Controller gRPC address (default: localhost:50051, env: PSS_CONTROLLER)
  --token <token>       Shared auth token (env: PSS_TOKEN)

Commands:
  apply -f <file>                                     Apply a YAML configuration
  get <pods|services|nodes|events>                    Query cluster state
  scale workload/<name> --replicas <N>                Set desired replica count
  autoscale workload/<name> --cpu --mem --min --max   Configure autoscaling
  logs pod/<id>                                       Stream pod logs

Examples

# List all pods
./pssctl --token "$PSS_TOKEN" get pods

# List nodes and their resource usage
./pssctl --token "$PSS_TOKEN" get nodes

# Scale a workload manually
./pssctl --token "$PSS_TOKEN" scale workload/my-app --replicas 5

# View events (scheduling, scaling, errors)
./pssctl --token "$PSS_TOKEN" get events

# Stream logs from a specific pod
./pssctl --token "$PSS_TOKEN" logs pod/abc123

Multi-Node Deployment

A typical production-like setup:

┌─────────────────────────────────────────────────────┐
│  Control Node                                        │
│  ├── pss-controller --listen :50051 --db /data/pss.db│
│  └── pssctl (operator access)                        │
├─────────────────────────────────────────────────────┤
│  Worker Node 1                                       │
│  └── pss-agent --controller control-node:50051       │
├─────────────────────────────────────────────────────┤
│  Worker Node 2                                       │
│  └── pss-agent --controller control-node:50051       │
├─────────────────────────────────────────────────────┤
│  Proxy Node (or co-located)                          │
│  └── pss-proxy --controller control-node:50051       │
│       --service my-service --listen :80              │
└─────────────────────────────────────────────────────┘
  1. Start the controller on the control node
  2. Start an agent on each worker node (pointing to the controller)
  3. Apply workloads and services via pssctl
  4. Start a proxy for each service that needs external traffic

Key Concepts

Concept Description
Workload Desired-state object with a PodSpec, replica count, and optional port mappings (like a Kubernetes ReplicaSet)
ServiceSpec Maps a service port to backend pods, selects backends by workload name
AutoScalePolicy Defines min/max replicas and CPU/MEM thresholds for automatic scaling
Placement Policy spread (default) distributes pods evenly; binpack consolidates onto fewer nodes
Reconciler Runs every 10s, compares desired vs actual pod count, creates/deletes pods to match
Cooldown Minimum 60s between autoscale actions to prevent thrashing
Dynamic Port Mapping Each replica gets a random host port from Podman, allowing multiple replicas per node

Networking

On Linux, containers get routable IPs within the Podman network and the proxy connects directly to container IPs.

On macOS with Podman Machine, container IPs are not routable from the host (they live inside a VM). PSS handles this transparently:

  1. Workloads define ports with containerPort — Podman allocates a unique random host port per replica
  2. The agent returns 127.0.0.1:<dynamicPort> as the pod address
  3. The service manager parses the host:port and creates endpoints with the actual mapped port
  4. The proxy connects to 127.0.0.1:<dynamicPort> — works on both Linux and macOS

This means multiple replicas of the same workload can run on a single node without port conflicts.

Regenerating Protobuf Code

Only needed if you modify proto/pss.proto:

protoc \
  --go_out=. --go_opt=paths=source_relative \
  --go-grpc_out=. --go-grpc_opt=paths=source_relative \
  proto/pss.proto

Project Structure

├── cmd/
│   ├── pss-controller/    # Controller entrypoint
│   ├── pss-agent/         # Agent entrypoint
│   ├── pss-proxy/         # Proxy entrypoint
│   └── pssctl/            # CLI entrypoint
├── examples/
│   ├── my-app.yaml        # Sample Workload
│   ├── my-service.yaml    # Sample Service
│   └── my-autoscaler.yaml # Sample AutoScalePolicy
├── pkg/
│   ├── agent/             # Podman client, agent gRPC server/client
│   ├── auth/              # Shared-token gRPC interceptors
│   ├── autoscaler/        # Autoscaler loop and evaluation logic
│   ├── cli/               # CLI commands, table renderer, gRPC client
│   ├── controller/        # Controller gRPC server, node registry
│   ├── events/            # Event construction helpers
│   ├── models/            # Domain types, YAML parsing, validation
│   ├── proxy/             # Backend watcher, round-robin LB, health checker, TCP listener
│   ├── reconciler/        # Reconciler loop and delta computation
│   ├── retry/             # Exponential backoff utility
│   ├── scheduler/         # Node filtering and scoring (spread/binpack)
│   ├── service/           # Service manager and endpoint tracking
│   └── store/             # StateStore interface and SQLite implementation
├── proto/
│   ├── pss.proto          # Protobuf definitions
│   └── psspb/             # Generated Go code
├── go.mod
└── go.sum

License

TBD

About

Podman Service & Scheduler (PSS)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages