Stop cloning osism/testbed during deployment#2896
Draft
ideaship wants to merge 8 commits into
Draft
Conversation
Contributor
Author
|
ansible-lint check failed due to a transient registry pull failure ( |
Contributor
Author
|
recheck |
3c89c87 to
a65ca4a
Compare
All seven playbooks derived basepath from
{{ ansible_user_dir }}/src/{{ repositories['testbed']['path'] }}, a
hardcoded path that only resolved correctly if the checkout lived at
~/src/github.com/osism/testbed. The Makefile worked around this with
-e basepath="$(PWD)", but $(PWD) resolves to the caller's directory,
not the testbed checkout, so make -C /path/to/testbed deploy from a
different CWD produced the wrong path.
Replace with {{ playbook_dir | dirname }}. playbook_dir is Ansible's
built-in for the directory of the executing playbook; dirname strips
one level to give the testbed checkout root. This resolves correctly
for local development, Zuul, and worktrees at arbitrary locations
without any convention or Makefile override.
Add playbooks/tasks/_basepath_check.yml, a shared task list that
verifies basepath contains ansible/manager-part-1.yml. Each playbook
imports it as a pre_task so a misconfigured basepath fails immediately
with a clear message instead of silently mis-deploying.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
contrib/setup-testbed.py --prepare iterates every entry in
repositories.yml and clones each into <testbed>/.src/{path}. For the
testbed entry this meant re-cloning osism/testbed from GitHub into
<checkout>/.src/github.com/osism/testbed on every bootstrap run.
manager-part-1.yml then pushed that GitHub clone — not the working
checkout — to /opt/configuration on the manager, making local edits
invisible to the deploy.
With basepath now derived from playbook_dir | dirname, nothing reads
repositories['testbed']. Removing the entry stops the redundant clone.
The ansible-collection-* and terraform-base entries stay; they still
feed setup-testbed.py --prepare and the orchestrator-side
ansible-galaxy install in deploy.yml.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
manager-part-1.yml needs the orchestrator-side testbed checkout path
to use as the source for syncing to /opt/configuration. Pass basepath
via -e basepath={{ basepath | quote }} alongside the existing
-e repo_path=... argument.
The | quote filter wraps the value in single quotes and escapes embedded
quotes so that checkout paths containing spaces survive the shell
expansion in ansible.builtin.shell's cmd:.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
The previous synchronize task pushed the contents of
{{ repo_path }}/{{ repo }}/ — the orchestrator's
<testbed>/.src/github.com/osism/testbed GitHub clone — to
/opt/configuration on the manager. Local edits to the testbed checkout
were therefore never deployed; only the clone's content reached the
manager.
Replace it with a block/always sequence that:
1. Creates a temp dir on the orchestrator.
2. Adds an ephemeral detached git worktree at HEAD into that dir.
3. Runs ansible.posix.synchronize with delete: true and --exclude=/.git
to push the worktree contents to /opt/configuration on the manager.
4. Removes the worktree in always:, even on sync failure.
A git worktree at HEAD materializes only tracked files in the state
recorded at that commit. .git/, venv/, .tox/, .src/, .terraform/, and
other untracked paths never reach /opt/configuration. Stale files are
removed cleanly via rsync --delete since the source tree has no junk.
Dev iteration goes through git commit (or --amend). Working-tree edits
that aren't committed do not deploy. The manager's state corresponds to
a specific SHA, useful for correlating manager behaviour with a
particular tree version in logs and bug reports.
Also remove the two tasks (Get home directory of ansible user, Set
repo_path fact) that existed only to compute repo_path for the
synchronize task replaced above. Nothing in manager-part-1.yml now
consumes repo_path: osism.commons.repository runs before the fact was
set and is unrelated to git clones; the supplementary-repo sync lives
in manager-part-0.yml. The -e repo_path=... argument that deploy.yml
passes to manager-part-1 is now vestigial but harmless.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
a65ca4a to
072baea
Compare
The bootstrap target passed -e basepath="$(PWD)" and a derived
-e repo_path="$(PWD)/.src/github.com" to ansible-playbook. With
basepath now derived from playbook_dir | dirname in the playbooks,
these overrides defeat the new default and reintroduce hardcoded paths.
$(PWD) was also subtly wrong: GNU make resolves it to the caller's
directory, not the testbed checkout, so make -C /path/to/testbed deploy
from a different CWD set basepath to the wrong path.
Remove both overrides; playbook_dir | dirname is now the single source
of truth for basepath.
Removing the -e repo_path= override exposed the deploy.yml default,
which must account for two different supplementary-repo layouts:
- Local: setup-testbed.py --prepare clones ansible-collection-* and
terraform-base into <checkout>/.src/github.com.
- Zuul: those repos arrive via required-projects under
{{ ansible_user_dir }}/src/github.com; .src/ is never created.
A single static path cannot serve both, so repo_path selects by
context: {{ ansible_user_dir }}/src/github.com under Zuul (zuul is
defined), else {{ basepath }}/.src/github.com locally.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
The custom osism-fqcn ansible-lint rule checks that every task module appears in an allowlist of approved FQCNs. The rule's skip guard only applies when a task has no action at all; block constructs have their action set to "block/always/rescue" internally, which is not present in the allowlist (the list contains "block", "always", and "rescue" as separate entries). The result is that any bare block: task triggers a spurious osism-fqcn violation. The established codebase pattern, already applied in ansible/manager-part-0.yml and ansible/manager-part-3.yml, is to annotate the affected task name with # noqa: osism-fqcn. AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>
Replace the executor-side basepath check (_basepath_check.yml) with a resolver task (_basepath.yml) that derives basepath where it is consumed: on the orchestrator node. The old approach assumed basepath was already set and validated it with delegate_to/run_once, which pinned the check to the executor. In Zuul CI the executor and the orchestrator node are separate hosts, so an executor-side path does not exist on the node and CI deployments fail. Locally they happen to be the same host, which masked the problem. The new task sets basepath via set_fact (no delegation): - In Zuul: ansible_user_dir / zuul.project.src_dir - Locally: git rev-parse --show-toplevel relative to playbook_dir A stat check on ansible/manager-part-1.yml guards against a basepath that does not point to a testbed checkout. AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>
Each deploy playbook previously defined basepath as a play var:
basepath: "{{ playbook_dir | dirname }}"
This is an executor-side value, evaluated by Ansible on the machine
running the playbook. Under Zuul the executor is not the same host as
the orchestrator node where basepath is consumed, so the path was
wrong.
Task 1 introduced playbooks/tasks/_basepath.yml, which resolves
basepath node-side via set_fact (using zuul.project.src_dir in Zuul
environments, git rev-parse --show-toplevel locally) and then
verifies that the resulting path looks like a testbed checkout.
This commit wires the shared resolver into all 7 deploy playbooks:
- Remove the per-playbook `basepath: "{{ playbook_dir | dirname }}"`
play var so the resolver's set_fact becomes the sole source of
truth. Vars that reference {{ basepath }} (terraform_path,
ansible_path, repo_path, etc.) are templated lazily at task time,
after the set_fact in pre_tasks has run, so they continue to work.
- Replace the pre_tasks import of tasks/_basepath_check.yml with
tasks/_basepath.yml in each playbook.
The -e basepath={{ basepath | quote }} argument passed to
manager-part-1 in deploy.yml is unaffected; basepath is now resolved
correctly before that task runs.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
debe579 to
230bb63
Compare
Member
|
/lgtm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The testbed deploy flow has a subtle self-defeating step:
setup-testbed.py --prepare(triggered bymake bootstrap) clones a fresh copy ofosism/testbedfrom GitHub into
<checkout>/.src/github.com/osism/testbed. Themanager-part-1.ymlsynchronize task then pushes that GitHub clone —not the developer's working checkout — to
/opt/configurationon the manager.The practical effect: local edits to the testbed checkout never reach the
manager. To test a local change you had to push it to GitHub first.
The Makefile worked around this by overriding
basepathandrepo_pathin the bootstrap deployment invocation, keeping local runs functional. But
this masked rather than fixed the problem: the override used
$(PWD), whichGNU make resolves to the caller's directory, not the testbed checkout, so
make -C /path/to/testbed deployfrom a different CWD would break. And itmeant the playbook defaults were wrong for anyone not using the Makefile.
Zuul was unaffected because it never runs
--prepare; itsbasepathresolved correctly via the old default (Zuul's checkout path matched the
hardcoded convention). So the problem was invisible in CI.
What changed
1.
basepathderives fromplaybook_dir | dirname.Every playbook used to compute
basepathas{{ ansible_user_dir }}/src/{{ repositories['testbed']['path'] }}— ahardcoded path that only worked if the checkout lived at
~/src/github.com/osism/testbed. It now uses{{ playbook_dir | dirname }}, Ansible's built-in variable for thedirectory of the executing playbook.
dirnamestrips one level to givethe checkout root.
This is correct for the intended invocation paths: Zuul's checkout, a
local developer's path, or a worktree at an arbitrary location.
playbook_dirresolves to where the playbook actually is. No convention,symlink, or override needed. The sanity check (Change 4) catches
misconfigured invocations.
The Makefile's
-e basepath=...and-e repo_path=...overrides areremoved;
playbook_dir | dirnameis now the single source of truth.(
deploy.yml'srepo_pathdefault is also fixed to derive frombasepathrather than
ansible_user_dir, for the same reason.)2. The
testbed:entry is removed fromplaybooks/vars/repositories.yml.setup-testbed.py --prepareclones every entry inrepositories.ymlinto<checkout>/.src/…. After Change 1, nothing readsrepositories['testbed']for its path, so the clone serves no purpose. Removing the entry stops
--preparefrom pulling a freshosism/testbedfrom GitHub on every run.The remaining entries (
ansible-collection-commons,ansible-collection-services,terraform-base) are unaffected; they arestill needed for orchestrator-side
ansible-galaxy collection install.3. The synchronize task is replaced with an ephemeral worktree sync.
The old task pushed the contents of
{{ repo_path }}/osism/testbed/— the stale GitHub clone — to/opt/configurationon the manager. The replacement:(
git worktree add --detach)./opt/configuration(delete: true,--exclude=/.git).always:block, even on sync failure.A git worktree at HEAD materializes only tracked files in the state recorded
at that commit — no working-tree modifications, no untracked files, no
.git/directory, no build artifacts. The source is always clean, sodelete: trueis safe and leaves/opt/configurationin an exact, knownstate.
Development iteration goes through
git commit(or--amend). Themanager's deployed state corresponds to a specific SHA, which is useful for
correlating manager behaviour with a particular tree version in logs and bug
reports.
4. A fast-fail sanity check is added to all seven playbooks.
A shared task list (
playbooks/tasks/_basepath_check.yml) checks thatbasepathpoints to a directory containingansible/manager-part-1.yml.Each playbook imports it as a
pre_task. A misconfiguredbasepathnowfails immediately with a clear message instead of silently mis-deploying.
Scope
This change covers only the
osism/testbedself-clone. The other repos inrepositories.yml(ansible-collection-*,terraform-base) still followthe existing path:
setup-testbed.py --prepareclones them, anddeploy.ymlinstalls them viaansible-galaxy. They are out of scope here.Before / after
basepathsource~/src/github.com/osism/testbed(Makefile override for local)playbook_dir | dirname(always the invoking checkout)mainat clone timesetup-testbed.py --prepareclones testbed.src/github.com/osism/testbedcreated.git/in/opt/configurationgit commit)Test plan
End-to-end verified on regiocloud (DEPLOY_MODE=manager, VERSION_MANAGER=latest):
basepath=/tmpsetup-testbed.py --prepareno longer creates.src/github.com/osism/testbedRun manager part 1 + 2completed without error/opt/configuration/environments/manager/configuration.ymlon manager (confirms worktree sync pushed committed content).git/directory on manager (test ! -d /opt/configuration/.git && echo OK)clone.*osism/testbedreferences in deploy log🤖 Generated with Claude Code