-
Notifications
You must be signed in to change notification settings - Fork 2
docs: refresh infra docs for post-Hetzner architecture #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,17 +1,54 @@ | ||
| # API server | ||
|
|
||
| The API server is a service that allows performing limited management operations on the infrastructure. It mainly exists to securely allow the Server Edition's CI to tell Caddy about the fact that new packages have been deployed. | ||
| The API server is a small service that exposes a couple of privileged management | ||
| operations to the project's GitHub Actions workflows — primarily so that CI can | ||
| trigger restarts of Caddy and self-upgrades of the API server itself, without | ||
| needing SSH access to the backend host. | ||
|
|
||
| Presently, there is only one endpoint: `POST /admin/restart_web_server`. This endpoint initiates restarting of Caddy, but does not wait for it to finish. | ||
| It runs as a systemd-managed Puma process (`apiserver.service`) on the Hetzner | ||
| backend host, listening on a Unix socket at `/run/apiserver/server.sock`. Caddy | ||
| reverse-proxies the `/admin/*` paths from `apt.fullstaqruby.org` and | ||
| `yum.fullstaqruby.org` to that socket — there is no dedicated `apiserver.*` | ||
| hostname. | ||
|
|
||
| ## Calling the production instance | ||
| ## Endpoints | ||
|
|
||
| The production instance is deployed at https://apiserver-f7awo4fcoa-uk.a.run.app/. To call it, you need to include your Google Cloud identity token in the Authorization header. | ||
| - `GET /` — health check, returns `ok`. | ||
| - `POST /admin/upgrade_apiserver` — kicks off `apiserver-deployer` (which fetches | ||
| the latest API server release from GitHub and activates it), then restarts | ||
| `apiserver` itself. Callable only from `fullstaq-ruby/infra`'s `deploy` | ||
| GitHub Actions environment. | ||
| - `POST /admin/restart_web_server` — restarts Caddy. Callable only from | ||
| `fullstaq-ruby/server-edition`'s `deploy` GitHub Actions environment. | ||
|
|
||
| ```bash | ||
| curl -v -H "Authorization: Bearer $(gcloud auth print-identity-token)" https://apiserver-f7awo4fcoa-uk.a.run.app/ | ||
| ``` | ||
| ## Authentication | ||
|
|
||
| The API server authenticates callers using **GitHub Actions OIDC**. Every | ||
| request must carry an `Authorization: Bearer <token>` header where the token | ||
| is an ID token minted by GitHub's OIDC provider with audience claim | ||
| `backend.fullstaqruby.org`. The server verifies the JWT signature against | ||
| GitHub's JWKS and rejects the request unless the token's `repository`, `sub`, | ||
| `runner_environment`, and `environment` claims match the calling repo's | ||
| expected `deploy` environment for that endpoint. | ||
|
|
||
| Because the audience and claim shape are tied to GitHub-hosted runners, the | ||
| endpoints are not callable directly by a human or from a local machine. | ||
|
|
||
| ## Continuous deployment | ||
|
|
||
| New API server code changes, when pushed to master, are automatically deployed by the Infrastructure project's CI. | ||
| `.github/workflows/apiserver.yml` builds and deploys the API server. Pushes | ||
| that touch `apiserver/**` (or the workflow file itself) trigger a build, and | ||
| pushes to `main` additionally trigger the `deploy` job — which tags the commit, | ||
| publishes a GitHub release with the build artifact, and calls | ||
| `POST https://apt.fullstaqruby.org/admin/upgrade_apiserver` with a freshly | ||
| minted OIDC token. | ||
|
|
||
| On the host, `apiserver-deployer` (a oneshot systemd unit running | ||
| `/usr/local/bin/apiserver-deployer`) handles the actual install: it fetches | ||
| the latest release metadata from the GitHub API, downloads the asset matching | ||
| the host's distribution and architecture, extracts it into | ||
| `/opt/apiserver/versions/<tag>-<dist>-<version>-<arch>`, installs any runtime | ||
| dependencies declared in `dpkg-dependencies.txt`, prunes all but the last five | ||
| versions, and atomically swaps the `/opt/apiserver/versions/latest` symlink. | ||
| The API server's working directory points at that symlink, so the subsequent | ||
| `systemctl restart apiserver` brings the new version online. |
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,19 +3,19 @@ | |
| We define as much infrastructure as possible in the form of code, using: | ||
|
|
||
| * [Terraform](https://terraform.io) | ||
| * Kubernetes YAML, managed with [Kustomize](https://kustomize.io/) | ||
| * [Ansible](https://www.ansible.com/) | ||
| * Github Actions | ||
|
|
||
| The infrastructure-as-code is stored in the following directories: | ||
|
|
||
| * `terraform/` — Infrastructure administered by [Infra Maintainers](roles.md), except for resources inside Kubernetes. Most of the infrastructure is defined here. | ||
| * `terraform/` — Infrastructure administered by [Infra Maintainers](roles.md). Most of the cloud-side infrastructure is defined here. | ||
|
|
||
| * `terraform-hisec/` — Infrastructure administered by [Infra Owners](roles.md). This covers for example resources in the `fullstaq-ruby-hisec` Google Cloud project. | ||
| * `terraform-hisec/` — Infrastructure administered by [Infra Owners](roles.md). This covers for example sensitive resources such as the GPG signing key in Azure Key Vault, and the high-security Terraform state backend. | ||
|
|
||
| Because we don't expect the infrastructure in this directory to change very often, we've chosen — for security reasons — not to run Terraform in a CI/CD pipeline. This way we don't have to worry about the security of the CI/CD pipeline's service account. Instead, an [Infra Owner](roles.md) runs Terraform manually, using that person's personal Google Cloud credentials. | ||
| Because we don't expect the infrastructure in this directory to change very often, we've chosen — for security reasons — not to run Terraform in a CI/CD pipeline. This way we don't have to worry about the security of any CI/CD pipeline credentials. Instead, an [Infra Owner](roles.md) runs Terraform manually, using their personal cloud credentials. | ||
|
|
||
| * `kubernetes/` — Kubernetes resources administered by [Infra Maintainers](roles.md). | ||
| * `ansible/` — Configuration of the backend VM (Caddy, the API server, Prometheus, and OS hardening). Administered by [Infra Maintainers](roles.md) and applied manually; see [Deployment guide](deploy.md). | ||
|
|
||
| * `.github/workflows/apiserver.yml` — Deploys the API server. | ||
| * `.github/workflows/apiserver.yml` — Builds and deploys the API server. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nowadays it's .github/workflows/ (multiple workflows that together do the build and deployment). |
||
|
|
||
| Note that not all infrastructure can, or (for security reasons) should, be managed via code. Learn more at [Infrastructure bootstrapping](infrastructure-bootstrapping.md). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # Infrastructure bootstrapping | ||
|
|
||
| We try to codify infrastructure as much as possible using Terraform and Kubernetes YAML. However: | ||
| We try to codify infrastructure as much as possible using Terraform and Ansible. However: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be an instruction step in this document for deploying the API server. |
||
|
|
||
| - Not everything _can_ be automated. For example, we need to setup Azure Blob Storage for storing Terraform state, before we can use Terraform. | ||
| - Not everything _should_ be automated. For example, the `fullstaq-ruby-hisec` project contains such sensitive data, that giving access to CI/CD systems would pose a security risk. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowadays it's the entire .github/workflows/ folder (multiple workflows that together do the build and deployment)