feat(ansible): add Molecule + Lima testing for the Hetzner playbook#60
Open
abtreece wants to merge 9 commits into
Open
feat(ansible): add Molecule + Lima testing for the Hetzner playbook#60abtreece wants to merge 9 commits into
abtreece wants to merge 9 commits into
Conversation
Adds a Molecule scenario that converges the playbook against a Lima VM with all live external dependencies (Azure Key Vault, GCS, GitHub releases) replaced by stubs, so the scenario runs without cloud credentials. Full lifecycle (create + prepare + converge + idempotence + verify + destroy) is green end-to-end. Test seams in prod files (~10 lines, all gated behind molecule_test; prod behavior byte-identical when molecule_test is unset): - templates/caddy-env.j2: skip az keyvault lookups when molecule_test - tasks/caddy.yml: gate query-latest-repo-versions and Caddyfile install - tasks/apiserver-deployer.yml: gate apiserver-deployer.sh install - tasks/unattended-upgrades.yml: gate the GitHub-release deb install Test artifacts live under ansible/files/molecule-test/ (stubs and service-contract fixture) and ansible/molecule/default/ (scenario). Pinned tooling versions in ansible/requirements.txt. Contributor instructions in ansible/README.md.
The "Create apiserver deployment directory" task used recurse:true with mode:0755. The recurse walked into vendor/bundle/ruby/*/bin/ (gem-bin shims) and the latest symlink. Both report mode 0777 from stat() because they are symlinks, and Linux cannot actually chmod a symlink, but Ansible still reports changed:true because requested mode != observed mode. Effect: every ansible-playbook run after the first apiserver release deploy has been reporting at least one task changed, forever. Masked by absence of an idempotence test in CI; surfaced by the new Molecule scenario. Release tarballs are extracted by the deployer running as itself, so ownership inside the tree is already correct. The recurse-chown was over-defensive.
0a5d120 to
8acb27e
Compare
There was a problem hiding this comment.
Pull request overview
Adds a Lima-backed Molecule scenario for exercising the Ansible Hetzner playbook locally with stubs for selected external dependencies.
Changes:
- Adds Molecule create/prepare/converge/verify/destroy playbooks for a Debian 12 Lima VM.
- Adds apiserver/Caddy test fixtures and stub scripts for Molecule runs.
- Gates selected production tasks for
molecule_testand documents local setup.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
ansible/templates/caddy-env.j2 |
Uses stub Azure DNS credentials during Molecule runs. |
ansible/tasks/unattended-upgrades.yml |
Skips remote collector package install in Molecule. |
ansible/tasks/caddy.yml |
Skips production Caddy helper/config copies in Molecule. |
ansible/tasks/apiserver-deployer.yml |
Skips deployer script install in Molecule and removes recursive chown. |
ansible/requirements.txt |
Adds pinned Python tooling requirements. |
ansible/README.md |
Documents Molecule/Lima setup, workflow, and coverage limits. |
ansible/molecule/default/molecule.yml |
Defines the Molecule scenario and inventory defaults. |
ansible/molecule/default/lima.yaml |
Defines the Debian Lima VM. |
ansible/molecule/default/create.yml |
Creates/configures the Lima instance for Molecule. |
ansible/molecule/default/destroy.yml |
Deletes the Lima instance. |
ansible/molecule/default/prepare.yml |
Installs packages assumed by the playbook/test fixture. |
ansible/molecule/default/converge.yml |
Runs the production task list with Molecule stubs and fixture staging. |
ansible/molecule/default/verify.yml |
Verifies service state and selected host configuration. |
ansible/files/molecule-test/test-Caddyfile |
Provides a Molecule-safe Caddy config. |
ansible/files/molecule-test/stub-query-latest-repo-versions |
Stubs repo-version metadata generation. |
ansible/files/molecule-test/stub-apiserver-deployer |
Stubs apiserver deployment. |
ansible/files/molecule-test/apiserver-fixture/Gemfile |
Defines Ruby fixture dependencies. |
ansible/files/molecule-test/apiserver-fixture/config.ru |
Adds the minimal Sinatra fixture app. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
A previous create-only check on instance presence skipped the start step when 'ansible-molecule' was in any state (including Stopped), causing converge to SSH to a powered-off VM. Gate on status instead: create when absent, resume when present and not Running.
A blanket failed_when:false suppressed every limactl delete failure, not just the expected absent-instance case, so Molecule could report success while leaving a wedged VM behind. Gate the delete on existence so real delete failures (permissions, broken state) fail loudly.
The fixture staging block was gated on /opt/apiserver/versions/latest existing, so iterative 'molecule converge' runs never picked up edits to the fixture Gemfile/config.ru and never reran bundler. Stage the files unconditionally (copy is idempotent), gate bundler on Gemfile or config.ru changes (or missing Gemfile.lock), and gate the recursive chown on bundler having actually run. The local feedback loop now reflects fixture edits without re-creating the VM.
The setup command pulled latest Molecule/ansible-core/molecule-plugins, so README users could land on tool versions newer than requirements.txt and hit divergent Molecule behavior. Pin all three packages inline to match requirements.txt and call out the version-sync requirement.
…to the VM The fixture Gemfile resolved against rubygems on every fresh VM, so an upstream puma/sinatra/rackup release could break Molecule runs even when the playbook didn't change. Commit a Gemfile.lock locked to linux platforms (x86_64-linux, aarch64-linux), copy it alongside the Gemfile in converge, and re-gate bundle install on lockfile or Gemfile/config.ru changes (or missing vendor/bundle). Regenerate via 'bundle lock' from the fixture directory when intentionally bumping versions.
…repare The raw task could not fulfill its stated purpose: gather_facts:true runs first and requires python3, and ansible_become:true routes the command through sudo - both prerequisites the task claimed to install. Lima's provision script (lima.yaml) already installs python3 and sudo before SSH is up, so the task was redundant in the working path and misleading in the failing path.
…cket Verify previously only confirmed Caddy was active, its config parsed, and the apiserver socket file existed - a permission or proxy regression (caddy user can't read the socket, reverse_proxy misconfigured) would still pass. The fixture exposes GET /admin/health returning 'ok', so add a uri request to http://127.0.0.1:8080/admin/health that exercises the full Caddy -> Unix socket -> apiserver path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Molecule scenario that converges
ansible/main.ymlagainst a local LimaVM, with all live external dependencies (Azure Key Vault, GCS, GitHub Releases)
replaced by stubs so the scenario runs offline-friendly without cloud
credentials. Gives us a fast local feedback loop for changes to the playbook
without touching
backend.fullstaqruby.org.ansible/molecule/default/scenario:lima.yaml— Debian 12 cloud-image VM (ansible-molecule)create.yml/destroy.yml—limactl-based driver (driver.name: default,managed: false).createbranches on Lima instance status so a previously stopped instance is resumed instead of being silently skipped;destroygates the delete on existence so reallimactl deletefailures surface instead of being swallowed.prepare.yml— installs packages the prod playbook assumes preinstalled (cron, ufw, acl, rsyslog, ruby-dev, build-essential). Python/sudo bootstrap is provided by Lima's provision script (lima.yaml), not by an Ansible raw task.converge.yml— runs the same task list asmain.ymlwithmolecule_test: true, plus stages a minimal Sinatra/Puma fixture at/opt/apiserver/versions/latestso the apiserver systemd unit can start. Fixture staging is idempotent: copy tasks rely on Ansible's checksum compare, andbundle installis gated on actual Gemfile/Gemfile.lock/config.ru changes (or missingvendor/bundle).verify.yml— asserts caddy/prometheus/fail2ban/ssh/apiserver areis-active, the apiserver Unix socket exists,caddy validatepasses, and that Caddy actually proxies/admin/*to the apiserver socket (exercises the full reverse_proxy → Unix socket → apiserver path so permission/proxy regressions fail verify), plus SSH/UFW/unattended-upgrades config matches expectations.files/templatessymlinks back toansible/files/ansible/templatesso the scenario stays in sync with the real playbook with zero duplicationansible/files/molecule-test/— apiserver Sinatra fixture (Gemfile + committed Gemfile.lock locked to linux platforms + config.ru), stubquery-latest-repo-versionsandapiserver-deployerscripts, and a test Caddyfile withauto_https offansible/requirements.txtpins the local toolchain (ansible-core,molecule,molecule-plugins);ansible/README.mdsetup command pins these inline so README users land on the same versionsansible/README.md— setup, daily loop, troubleshooting, and an explicit "what is not tested" sectionProd playbook touch-ups
Kept deliberately minimal — gating, not refactoring:
tasks/caddy.yml,tasks/apiserver-deployer.yml,tasks/unattended-upgrades.yml: fourapt/get_url/copytasks that fetch artifacts from the public internet are gated onwhen: not (molecule_test | default(false)). No-ops in prod.templates/caddy-env.j2: whenmolecule_testis true, the two Azure Key Vaultlookup('pipe', ...)calls are replaced by stub strings. No-op in prod.tasks/apiserver-deployer.yml: droppedrecurse: truefrom the/opt/apiserver/versionsdirectory creation. The flag was forcing a recursive chown of the entire versions tree on every run, including release tarballs owned byapiserver-deployer— unnecessary and surprising in prod, and broke idempotence in the scenario.Copilot review remediation
All seven inline comments from the automated Copilot review addressed:
converge.ymlapiserver-fixture/Gemfile.lock(linux platforms) + copy it onto the VM; bumps are a deliberatebundle lockansible/README.mduv tool installcommand pinned to matchrequirements.txtdestroy.ymllimactl deletefailures now surface; only the absent-instance case is silently skippedcreate.ymlverify.ymluri:task that asserts the full Caddy → Unix socket → apiserver pathprepare.ymlEach thread has an inline reply linking the fix commit.
What is not covered
Documented in
ansible/README.md, but worth flagging here:auto_https off)apiserver/Ruby tests)fail2banbanning behavior (only verifies the unit starts cleanly)Test plan
molecule test(full lifecycle: destroy → create → prepare → converge → idempotence → verify → destroy) passes on macOS arm64idempotencephase reportschanged=0(stricter than running converge twice manually)molecule verifypasses including the new/admin/healthHTTP assertion through Caddyansible-playbookrun againstbackend.fullstaqruby.orgshows no unexpected changes from the gated tasks or the droppedrecurse: true