Skip to content

feat: default Freenet install to a supervised, auto-updating service#4590

Merged
sanity merged 6 commits into
mainfrom
feat/default-supervised-install
Jun 27, 2026
Merged

feat: default Freenet install to a supervised, auto-updating service#4590
sanity merged 6 commits into
mainfrom
feat/default-supervised-install

Conversation

@sanity

@sanity sanity commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Problem

A typical Linux Freenet install ends up unsupervised, so it never auto-updates. scripts/install.sh defaulted the "install as a service?" prompt to No, and a node with no service manager catching its exit-42 "update needed" signal exits to update and never restarts on the new version - it silently stops updating. Since unsupervised is the dominant default on Linux, much of the network freezes on old releases.

Solution

Make a supervised install the default (part of the auto-update work, #4073):

  • install.sh sets up supervision by default, unless the user explicitly opts out with FREENET_NO_SERVICE=1. The interactive prompt now defaults to Yes ([Y/n]); a non-interactive curl | sh run sets up supervision automatically (the worse default is silently-never-updates).
  • Linux: prefer a SYSTEM service when it can elevate (already root, or sudo available) - most reliable on the servers/VPS that dominate the node population (runs at boot, survives logout). When it can't elevate, it falls back to a USER service. A node is only left unsupervised as a last resort, with a loud warning explaining it won't auto-update.
  • The binary's user-service install now enables systemd lingering (loginctl enable-linger <user>) by default, so a --user service runs without an active login session. This is the essential headless-server fix: without lingering, a user service stops at logout and never catches exit-42, so it never auto-updates. A new --no-linger flag opts out. System services are unaffected (they start at boot regardless).
  • Idempotent / safe to re-run: the decision honors an existing install and refreshes the same service type instead of creating a duplicate of the other kind.

The generated systemd units are unchanged - this reuses the existing unit generation, so the StartLimit/exit-45 work from #4570/#4588 is preserved.

Important

This changes default install behavior (unsupervised -> supervised). New installs (and re-runs over an existing install) will now set up a systemd service by default.

Warning

freenet.org/install.sh mirror must be updated in lockstep. scripts/install.sh is mirrored at hugo-site/static/install.sh in freenet/web (served from https://freenet.org/install.sh). The website copy is what actual curl | sh users get, so this change does not reach users until the mirror is synced. I can't deploy the website - @sanity to update the mirror after merge.

Testing

  • New scripts/test-install-sh.sh smoke-tests the system-vs-user decision (root / passwordless-sudo / existing-unit / interactive permutations) by sourcing install.sh and overriding the environment probes - no real root/sudo needed.
  • New linux.rs unit test pins the lingering policy: a system service never lingers; a user service lingers unless --no-linger.
  • Wired into CI (ci.yml): the new install smoke test, the existing-but-previously-unwired uninstall smoke test, and shellcheck on install.sh / uninstall.sh / both test scripts.
  • cargo fmt, cargo clippy -p freenet --bins -- -D warnings, and the service module tests (95) are green; shellcheck clean.

Refs #4073

[AI-assisted - Claude]

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

I have all the information needed. Let me synthesize the full review.


Rule Review: No blocking issues; one minor style note

Rules checked: git-workflow.md, code-style.md, testing.md
Files reviewed: 6


Checklist

git-workflow.md

  • PR title feat: default Freenet install to a supervised, auto-updating service is valid conventional-commits prefix ✓
  • The five fix: commits are roll-in refinements to the same feat:, not standalone fixes for pre-existing bugs — the fix: regression-test CI gate triggers on PR title prefix (fix:), which is feat: here ✓
  • All new fix: commits that touch pure decision logic (should_refresh_system_unit, linger_action) have corresponding tests ✓

code-style.md

  • No .unwrap() in new production Rust code (linger_enabled uses .unwrap_or(false); current_username and enable_linger use explicit match/if-let) ✓
  • No new async fire-and-forget spawns ✓
  • No biased; additions ✓
  • No hand-rolled backoff loops ✓
  • LingerAction match in install_user_service is exhaustive (all three variants named, with a comment explaining why SystemService is technically unreachable but matched for policy cohesion) ✓
  • deployment.md's platform-gated-code-path rule: linger_action() is correctly extracted as a pure testable function, matching the recommended pattern ✓

testing.md (scope: crates/core/**)

  • linger_action_matches_policy covers all 4 combinations of (system, no_linger) exhaustively ✓
  • Shell-side decision logic (decide_linux_service_mode, resolve_service_action, should_refresh_system_unit) has 10 distinct test cases in scripts/test-install-sh.sh
  • Side-effecting paths in setup_service() (binary-accessibility check, write-permission check) have no direct tests, but these require real sudo/file-system access and are correctly excluded from the testable-pure-function surface ✓

Warnings

None.


Info

  • crates/core/src/bin/commands/service/linux.rs:196enable_linger() uses a catch-all _ => that merges "non-zero exit" and "I/O error" into a single warning message. Both cases share identical remediation text so collapsing them is intentional and reasonable, but it technically violates the style guidance against catch-all arms. Could be split into Ok(status) => { ... status.code() ... } / Err(e) => { ... e ... } to surface the distinction in the warning. (rule: code-style.md — catch-all _ =>)


Rule review against .claude/rules/. WARNING findings block merge.

@sanity

sanity commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

Multi-model review: Codex (external, non-Claude)

Ran codex review --base origin/main iteratively; addressed every finding, re-running after each fix. Final pass: no actionable findings.

Findings and resolutions:

  1. [P1] System unit from /root fails to start — a curl | sudo sh run installs the binary under /root/.local/bin (HOME=/root); a system unit running as $SUDO_USER pointed ExecStart at a path that user can't traverse. Fixed: verify the service user can execute the binary (sudo -u $SUDO_USER test -x) before creating the unit; otherwise warn and direct them to re-run as the non-root user without sudo.
  2. [P2] Duplicate service on unprivileged refresh — when a system unit existed, a rerun without passwordless sudo failed the refresh then installed a user service alongside it. Fixed: suppress the user-service fallback when a system unit already exists (binary is updated in place; unit refresh just needs sudo).
  3. [P2] Refresh re-points the service to a different user — refreshing an existing system unit derives User=/paths from the current sudo user, so a rerun by a different account silently rewrote the unit and orphaned the original node's data. Fixed: same-user refresh guard (should_refresh_system_unit) reads the existing unit's User= and only refreshes when it matches; otherwise skips with a warning.
  4. [P2] Broken sudo freenet start hint — sudo's secure_path excludes ~/.local/bin. Fixed: the post-install hint uses the absolute binary path (plus a sudo systemctl start freenet equivalent).

Risk tier: Full (touches deploy/install + CI config). New decision logic is covered by scripts/test-install-sh.sh and a linux.rs unit test; shellcheck + cargo fmt/clippy -D warnings/service tests green locally.

[AI-assisted - Claude]

sanity and others added 4 commits June 26, 2026 22:02
Problem
-------
A typical Linux Freenet install ends up unsupervised, so it never
auto-updates. scripts/install.sh defaulted the "install as a service?"
prompt to No, and a node with no service manager catching its exit-42
"update needed" signal exits to update and never restarts on the new
version - it silently stops updating. With unsupervised being the
dominant default on Linux, much of the network freezes on old releases.

Solution
--------
Make a supervised install the DEFAULT (issue #4073):

- install.sh now sets up supervision unless the user explicitly opts out
  (FREENET_NO_SERVICE=1). The interactive prompt defaults to Yes ([Y/n]);
  a non-interactive curl|sh run sets up supervision automatically.
- On Linux it prefers a SYSTEM service when it can elevate (already root,
  or sudo) - most reliable on the servers/VPS that dominate the node
  population (runs at boot, survives logout). When it cannot elevate it
  falls back to a USER service. A node is only left unsupervised as a
  last resort, with a loud warning explaining it will not auto-update.
- The binary's user-service install now enables systemd lingering
  (`loginctl enable-linger <user>`) by default so a --user service runs
  without an active login session (the headless-server footgun: without
  linger it stops at logout and never auto-updates). New `--no-linger`
  flag opts out. System services are unaffected (they start at boot).
- The decision honors an existing install so a re-run refreshes the same
  service type instead of creating a duplicate (idempotent + safe).

The generated systemd units are unchanged - this reuses the existing
unit generation, so the StartLimit/exit-45 work from #4570/#4588 is
preserved.

NOTE: this changes default install behavior (unsupervised -> supervised).

Testing
-------
- New scripts/test-install-sh.sh smoke-tests the system-vs-user decision
  (root / sudo / existing-unit / interactive permutations) by sourcing
  install.sh and overriding the environment probes. Wired into CI along
  with shellcheck on install.sh/uninstall.sh and the existing (previously
  unwired) uninstall smoke test.
- New linux.rs unit test pins the lingering policy (system never lingers;
  user lingers unless --no-linger).
- shellcheck clean; cargo fmt / clippy -D warnings / service tests green.

Refs #4073

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014dGjU1Q6Vpk2dm4sUf4pdU
Two issues found by Codex review of install.sh:

- P1: a `curl | sudo sh` run installs the binary under /root/.local/bin
  (HOME=/root), so a system unit running as $SUDO_USER pointed ExecStart
  at a path that user cannot traverse - the service installs but fails to
  start. Now verify the service user can execute the binary
  (`sudo -u $SUDO_USER test -x`) before creating the unit; otherwise warn
  and tell them to re-run as the non-root user without sudo.

- P2: when a system unit already exists, an unprivileged rerun without
  passwordless sudo failed the `sudo ... --system` refresh and then fell
  back to installing a USER service, leaving BOTH services installed. Now
  suppress the user-service fallback when a system unit already exists
  (the binary is already updated in place; the unit refresh just needs
  sudo) - avoiding the duplicate the existing-install routing prevents.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014dGjU1Q6Vpk2dm4sUf4pdU
Codex round-2 finding: when a system unit already exists, install.sh
auto-refreshed it via `sudo freenet service install --system`, and the
binary derives User=/home/log/ExecStart from the CURRENT sudo user. So a
rerun by a different sudo-capable account silently rewrote the unit to
run as that account, orphaning the original node's data/identity.

Add a same-user refresh guard: read the existing unit's `User=` and only
refresh when it matches the user the refresh would run as (or can't be
determined). Otherwise skip the refresh with a clear warning - the new
binary is already on disk and is picked up on the next restart.

New pure helper should_refresh_system_unit (unit-tested) plus
existing_system_unit_user to read the current User=.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014dGjU1Q6Vpk2dm4sUf4pdU
Codex round-3 finding: the post-install success message told users to run
`sudo freenet service start --system`, but sudo's secure_path usually
excludes ~/.local/bin, so a bare `sudo freenet` fails with "command not
found" - users would think the freshly installed service can't start.

Pass the absolute binary path into print_service_success and use it for
the system (sudo'd) start command, plus offer `sudo systemctl start
freenet` as an equivalent. The user (non-sudo) start hint is unchanged.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014dGjU1Q6Vpk2dm4sUf4pdU
@sanity sanity force-pushed the feat/default-supervised-install branch from beecc2c to e280ce2 Compare June 27, 2026 03:12
sanity and others added 2 commits June 26, 2026 22:17
Codex review (P2) on the default-supervised install flow: the root /
SUDO_USER branch of `setup_service` verified only that the service user
can EXECUTE the installed binary (`test -x`), not that it can REPLACE it.

The auto-update path (ExecStopPost -> `freenet update`, run as $SUDO_USER)
swaps the binary in place via `replace_binary`, which writes a temp file
in the binary's directory and renames it over the binary. That needs
WRITE permission on the directory (rename/unlink are governed by
directory perms, not the file's mode/owner), which the exec check does
not establish.

A root-owned but world-executable install dir (e.g.
FREENET_INSTALL_DIR=/usr/local/bin under `curl | sudo sh`) therefore
passed the exec check, installed a system service, and then failed EVERY
auto-update -- the exact silent "stops updating" failure default
supervision exists to prevent.

Add a directory-writability probe (`test -w` as the service user) after
the exec check; when it fails, warn with remediation (install to a
user-writable location) and leave the node unsupervised rather than
standing up a service that can never update itself.

shellcheck clean; scripts/test-install-sh.sh green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014dGjU1Q6Vpk2dm4sUf4pdU
Codex review (P2, round 2) on the default-supervised install flow: when
an existing system service belongs to a DIFFERENT user, setup_service
correctly skips re-templating the unit (refreshing as another user would
re-point User=/home/log/ExecStart and orphan the original node's
data/identity). But it then printed "The updated binary is in place;
restart the service to use it", which is wrong with the default per-user
install dir: this run downloaded the new binary into the CURRENT user's
home, while the existing unit's ExecStart still points at the original
user's binary -- so a restart keeps running the OLD version.

Read the existing unit's ExecStart path and only claim the binary is in
place when it matches where this run installed (e.g. a shared
FREENET_INSTALL_DIR). When the paths differ, tell the operator the
service was NOT updated and to re-run the installer as the service user
so the binary lands where the unit looks for it.

shellcheck clean; scripts/test-install-sh.sh green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014dGjU1Q6Vpk2dm4sUf4pdU
@sanity

sanity commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Reviewed + merging. Rebased onto the post-#4591/#4593 main (clean — install logic vs unit-template changes touch disjoint regions); verified every required unit directive survived (StartLimit* in [Unit], StartLimitAction=none, exit-45 marker, ExecStopPost catch-all firing update on 42/45/any-crash, RestartSteps/RestartMaxDelaySec, SuccessExitStatus) via grep + systemd-analyze verify (both units, exit 0) + 308 tests + shellcheck + test-install-sh 15/15. Two Codex P2s fixed: (a) a dir-writability test -w probe so a root-owned/non-writable install dir leaves the node unsupervised with a warning instead of standing up a service that fails every auto-update; (b) reads the unit's actual ExecStart before claiming a cross-user refresh is 'in place'. One P2 (auto-elevate enable_linger via sudo -n) dismissed with justification — the install.sh routing already picks a system service when it can elevate, so the user+linger path is the deliberate no-elevation fallback and degrades accurately. This is the product-behavior change (default install → supervised, system-preferred / user+linger fallback, FREENET_NO_SERVICE opt-out), signed off by the maintainer. CI green. NOTE: the freenet.org/install.sh mirror (hugo-site static/install.sh) needs a lockstep update post-merge or the public curl|sh installer keeps serving the old logic — separate deploy step with the release.

[AI-assisted - Claude]

@sanity sanity added this pull request to the merge queue Jun 27, 2026
Merged via the queue into main with commit 43b835f Jun 27, 2026
17 checks passed
@sanity sanity deleted the feat/default-supervised-install branch June 27, 2026 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant