Skip to content

feat(ansible): add Molecule + Lima testing for the Hetzner playbook#60

Open
abtreece wants to merge 9 commits into
fullstaq-ruby:mainfrom
abtreece:spike/ansible-molecule-mvp
Open

feat(ansible): add Molecule + Lima testing for the Hetzner playbook#60
abtreece wants to merge 9 commits into
fullstaq-ruby:mainfrom
abtreece:spike/ansible-molecule-mvp

Conversation

@abtreece
Copy link
Copy Markdown
Collaborator

@abtreece abtreece commented May 15, 2026

Summary

Adds a Molecule scenario that converges ansible/main.yml against a local Lima
VM, with all live external dependencies (Azure Key Vault, GCS, GitHub Releases)
replaced by stubs so the scenario runs offline-friendly without cloud
credentials. Gives us a fast local feedback loop for changes to the playbook
without touching backend.fullstaqruby.org.

  • New ansible/molecule/default/ scenario:
    • lima.yaml — Debian 12 cloud-image VM (ansible-molecule)
    • create.yml / destroy.ymllimactl-based driver (driver.name: default, managed: false). create branches on Lima instance status so a previously stopped instance is resumed instead of being silently skipped; destroy gates the delete on existence so real limactl delete failures surface instead of being swallowed.
    • prepare.yml — installs packages the prod playbook assumes preinstalled (cron, ufw, acl, rsyslog, ruby-dev, build-essential). Python/sudo bootstrap is provided by Lima's provision script (lima.yaml), not by an Ansible raw task.
    • converge.yml — runs the same task list as main.yml with molecule_test: true, plus stages a minimal Sinatra/Puma fixture at /opt/apiserver/versions/latest so the apiserver systemd unit can start. Fixture staging is idempotent: copy tasks rely on Ansible's checksum compare, and bundle install is gated on actual Gemfile/Gemfile.lock/config.ru changes (or missing vendor/bundle).
    • verify.yml — asserts caddy/prometheus/fail2ban/ssh/apiserver are is-active, the apiserver Unix socket exists, caddy validate passes, and that Caddy actually proxies /admin/* to the apiserver socket (exercises the full reverse_proxy → Unix socket → apiserver path so permission/proxy regressions fail verify), plus SSH/UFW/unattended-upgrades config matches expectations.
    • files/templates symlinks back to ansible/files/ansible/templates so the scenario stays in sync with the real playbook with zero duplication
  • New ansible/files/molecule-test/ — apiserver Sinatra fixture (Gemfile + committed Gemfile.lock locked to linux platforms + config.ru), stub query-latest-repo-versions and apiserver-deployer scripts, and a test Caddyfile with auto_https off
  • ansible/requirements.txt pins the local toolchain (ansible-core, molecule, molecule-plugins); ansible/README.md setup command pins these inline so README users land on the same versions
  • ansible/README.md — setup, daily loop, troubleshooting, and an explicit "what is not tested" section

Prod playbook touch-ups

Kept deliberately minimal — gating, not refactoring:

  • tasks/caddy.yml, tasks/apiserver-deployer.yml, tasks/unattended-upgrades.yml: four apt/get_url/copy tasks that fetch artifacts from the public internet are gated on when: not (molecule_test | default(false)). No-ops in prod.
  • templates/caddy-env.j2: when molecule_test is true, the two Azure Key Vault lookup('pipe', ...) calls are replaced by stub strings. No-op in prod.
  • tasks/apiserver-deployer.yml: dropped recurse: true from the /opt/apiserver/versions directory creation. The flag was forcing a recursive chown of the entire versions tree on every run, including release tarballs owned by apiserver-deployer — unnecessary and surprising in prod, and broke idempotence in the scenario.

Copilot review remediation

All seven inline comments from the automated Copilot review addressed:

# File Fix
1 converge.yml Fixture staging is idempotent on every converge; bundler gated on real changes, not on symlink existence
2 apiserver-fixture/ Committed Gemfile.lock (linux platforms) + copy it onto the VM; bumps are a deliberate bundle lock
3 ansible/README.md uv tool install command pinned to match requirements.txt
4 destroy.yml limactl delete failures now surface; only the absent-instance case is silently skipped
5 create.yml Branches on Lima instance status; resumes stopped instances instead of skipping
6 verify.yml Added a uri: task that asserts the full Caddy → Unix socket → apiserver path
7 prepare.yml Removed unreachable raw python3/sudo bootstrap (Lima provision script already covers it; gather_facts/become made the task useless anyway)

Each thread has an inline reply linking the fix commit.

What is not covered

Documented in ansible/README.md, but worth flagging here:

  • Real ACME/TLS issuance (test Caddyfile uses auto_https off)
  • Real Azure Key Vault, GCS, or GitHub-Release integration
  • OIDC JWT verification (covered by apiserver/ Ruby tests)
  • Live fail2ban banning behavior (only verifies the unit starts cleanly)
  • AppArmor profile loading (the playbook only installs the package today)

Test plan

  • molecule test (full lifecycle: destroy → create → prepare → converge → idempotence → verify → destroy) passes on macOS arm64
  • Molecule's built-in idempotence phase reports changed=0 (stricter than running converge twice manually)
  • molecule verify passes including the new /admin/health HTTP assertion through Caddy
  • Spot-check on prod: next ansible-playbook run against backend.fullstaqruby.org shows no unexpected changes from the gated tasks or the dropped recurse: true

abtreece added 2 commits May 14, 2026 22:43
Adds a Molecule scenario that converges the playbook against a Lima VM
with all live external dependencies (Azure Key Vault, GCS, GitHub
releases) replaced by stubs, so the scenario runs without cloud
credentials. Full lifecycle (create + prepare + converge + idempotence
+ verify + destroy) is green end-to-end.

Test seams in prod files (~10 lines, all gated behind molecule_test;
prod behavior byte-identical when molecule_test is unset):
- templates/caddy-env.j2: skip az keyvault lookups when molecule_test
- tasks/caddy.yml: gate query-latest-repo-versions and Caddyfile install
- tasks/apiserver-deployer.yml: gate apiserver-deployer.sh install
- tasks/unattended-upgrades.yml: gate the GitHub-release deb install

Test artifacts live under ansible/files/molecule-test/ (stubs and
service-contract fixture) and ansible/molecule/default/ (scenario).
Pinned tooling versions in ansible/requirements.txt. Contributor
instructions in ansible/README.md.
The "Create apiserver deployment directory" task used recurse:true with
mode:0755. The recurse walked into vendor/bundle/ruby/*/bin/ (gem-bin
shims) and the latest symlink. Both report mode 0777 from stat() because
they are symlinks, and Linux cannot actually chmod a symlink, but
Ansible still reports changed:true because requested mode != observed
mode.

Effect: every ansible-playbook run after the first apiserver release
deploy has been reporting at least one task changed, forever. Masked
by absence of an idempotence test in CI; surfaced by the new Molecule
scenario.

Release tarballs are extracted by the deployer running as itself, so
ownership inside the tree is already correct. The recurse-chown was
over-defensive.
@abtreece abtreece force-pushed the spike/ansible-molecule-mvp branch from 0a5d120 to 8acb27e Compare May 15, 2026 03:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Lima-backed Molecule scenario for exercising the Ansible Hetzner playbook locally with stubs for selected external dependencies.

Changes:

  • Adds Molecule create/prepare/converge/verify/destroy playbooks for a Debian 12 Lima VM.
  • Adds apiserver/Caddy test fixtures and stub scripts for Molecule runs.
  • Gates selected production tasks for molecule_test and documents local setup.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ansible/templates/caddy-env.j2 Uses stub Azure DNS credentials during Molecule runs.
ansible/tasks/unattended-upgrades.yml Skips remote collector package install in Molecule.
ansible/tasks/caddy.yml Skips production Caddy helper/config copies in Molecule.
ansible/tasks/apiserver-deployer.yml Skips deployer script install in Molecule and removes recursive chown.
ansible/requirements.txt Adds pinned Python tooling requirements.
ansible/README.md Documents Molecule/Lima setup, workflow, and coverage limits.
ansible/molecule/default/molecule.yml Defines the Molecule scenario and inventory defaults.
ansible/molecule/default/lima.yaml Defines the Debian Lima VM.
ansible/molecule/default/create.yml Creates/configures the Lima instance for Molecule.
ansible/molecule/default/destroy.yml Deletes the Lima instance.
ansible/molecule/default/prepare.yml Installs packages assumed by the playbook/test fixture.
ansible/molecule/default/converge.yml Runs the production task list with Molecule stubs and fixture staging.
ansible/molecule/default/verify.yml Verifies service state and selected host configuration.
ansible/files/molecule-test/test-Caddyfile Provides a Molecule-safe Caddy config.
ansible/files/molecule-test/stub-query-latest-repo-versions Stubs repo-version metadata generation.
ansible/files/molecule-test/stub-apiserver-deployer Stubs apiserver deployment.
ansible/files/molecule-test/apiserver-fixture/Gemfile Defines Ruby fixture dependencies.
ansible/files/molecule-test/apiserver-fixture/config.ru Adds the minimal Sinatra fixture app.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ansible/molecule/default/converge.yml Outdated
Comment thread ansible/files/molecule-test/apiserver-fixture/Gemfile
Comment thread ansible/README.md
Comment thread ansible/molecule/default/destroy.yml Outdated
Comment thread ansible/molecule/default/create.yml Outdated
Comment thread ansible/files/molecule-test/test-Caddyfile
Comment thread ansible/molecule/default/prepare.yml Outdated
abtreece added 7 commits May 14, 2026 23:04
A previous create-only check on instance presence skipped the start step
when 'ansible-molecule' was in any state (including Stopped), causing
converge to SSH to a powered-off VM. Gate on status instead: create when
absent, resume when present and not Running.
A blanket failed_when:false suppressed every limactl delete failure, not
just the expected absent-instance case, so Molecule could report success
while leaving a wedged VM behind. Gate the delete on existence so real
delete failures (permissions, broken state) fail loudly.
The fixture staging block was gated on /opt/apiserver/versions/latest
existing, so iterative 'molecule converge' runs never picked up edits
to the fixture Gemfile/config.ru and never reran bundler. Stage the
files unconditionally (copy is idempotent), gate bundler on Gemfile or
config.ru changes (or missing Gemfile.lock), and gate the recursive
chown on bundler having actually run. The local feedback loop now
reflects fixture edits without re-creating the VM.
The setup command pulled latest Molecule/ansible-core/molecule-plugins,
so README users could land on tool versions newer than requirements.txt
and hit divergent Molecule behavior. Pin all three packages inline to
match requirements.txt and call out the version-sync requirement.
…to the VM

The fixture Gemfile resolved against rubygems on every fresh VM, so an
upstream puma/sinatra/rackup release could break Molecule runs even
when the playbook didn't change. Commit a Gemfile.lock locked to
linux platforms (x86_64-linux, aarch64-linux), copy it alongside the
Gemfile in converge, and re-gate bundle install on lockfile or
Gemfile/config.ru changes (or missing vendor/bundle). Regenerate via
'bundle lock' from the fixture directory when intentionally bumping
versions.
…repare

The raw task could not fulfill its stated purpose: gather_facts:true runs
first and requires python3, and ansible_become:true routes the command
through sudo - both prerequisites the task claimed to install. Lima's
provision script (lima.yaml) already installs python3 and sudo before
SSH is up, so the task was redundant in the working path and misleading
in the failing path.
…cket

Verify previously only confirmed Caddy was active, its config parsed,
and the apiserver socket file existed - a permission or proxy
regression (caddy user can't read the socket, reverse_proxy
misconfigured) would still pass. The fixture exposes GET /admin/health
returning 'ok', so add a uri request to http://127.0.0.1:8080/admin/health
that exercises the full Caddy -> Unix socket -> apiserver path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants