Skip to content

GHCP — Crawl Track v2.3.1: Exercises 0–10 (bksw_format + bookshelf hygiene)#4186

Open
lbakerchef wants to merge 15 commits into
mainfrom
learn/crawl/lbaker-ex0-bootstrap
Open

GHCP — Crawl Track v2.3.1: Exercises 0–10 (bksw_format + bookshelf hygiene)#4186
lbakerchef wants to merge 15 commits into
mainfrom
learn/crawl/lbaker-ex0-bootstrap

Conversation

@lbakerchef
Copy link
Copy Markdown
Contributor

@lbakerchef lbakerchef commented May 18, 2026

Summary

GHCP Crawl Track v2.3.1 — Exercises 0–10 on the chef/chef-server repository.

This PR documents an AI-assisted learning track using the real production codebase.
Exercises progress from scaffolding and orientation through refactoring, validation,
docs, perf measurement, dependency audit, security hygiene, structured logging,
and CI baseline work. The chosen practice module is src/bookshelf/src/bksw_format.erl
— a pure utility module with no runtime side-effects and no downstream dependencies
inside bookshelf.


Review Focus

These are the areas that warrant the most reviewer attention.

Area File(s) Why it needs eyes
Input validation guards src/bookshelf/src/bksw_format.erl when is_binary(Bin) guards added to to_hex/1 and to_base64/1; verify guard placement doesn't break existing callers passing non-binary
Structured logging src/bookshelf/src/bksw_wm_sql_object.erl 3 error log lines updated to include req_id; verify pattern match on #context{reqid = ReqId} is exhaustive
Security hygiene .gitignore, spec/fixtures/pivotal.pem New *.pem/*.key/.env* gitignore entries; PEM header prepended — confirm header doesn't break any PEM parser that reads the fixture
Test coverage src/bookshelf/test/bksw_format_tests.erl 8 tests total (5 positive + 3 negative); confirm negative tests use ?assertException pattern correctly
Local test script scripts/test-bookshelf-unit.sh Verify ERL_LIBS and erlc -I include paths are portable across different dev setups

Changes by Exercise

Ex Title Files changed Type
0 Bootstrap scaffolding ai-track-docs/, .copilot-track/crawl/ docs
1+2 Repo orientation + deterministic tests SYSTEM-OVERVIEW.md, build-test.md, bksw_format_tests.erl docs + tests
3 Readability refactor bksw_format.erl refactor
4 Docstrings + extending guide bksw_format.erl, extending-bksw_format.md docs + comments
5 Input validation + negative tests bksw_format.erl, bksw_format_tests.erl validation + tests
6 Performance baseline perf-baseline.md docs
7 Dependency audit dependencies.md docs
8 Security hygiene .gitignore, pivotal.pem, setup_helper.erl, security-hygiene.md hygiene
9 Structured logging bksw_wm_sql_object.erl, logging.md improvement
10 CI baseline + local script scripts/test-bookshelf-unit.sh, build-test.md tooling + docs

Risk & Rollback

Risk tiers

Risk Items Severity
Behaviour change bksw_format.erl guards (when is_binary) 🟡 Low — existing callers in bksw_xml.erl, bksw_req.erl, bksw_wm_sql_object.erl, bksw_wm_object.erl all pass binary() values; no non-binary caller exists in the codebase
Log format change bksw_wm_sql_object.erl 3 lines 🟡 Low — format change only; no logic change; operators parsing old log format for alerting rules should update regex
PEM file header spec/fixtures/pivotal.pem 🟢 Minimal — text before -----BEGIN is valid PEM; test suite still passes
gitignore .gitignore new patterns 🟢 Minimal — already-tracked .pem files remain tracked; only blocks future untracked files
Docs/scripts All other files 🟢 None — additive only

Rollback

# Revert entire branch (safest — one command)
git revert --no-commit c289a28f3..HEAD && git commit -m "revert: crawl track exercises 0-10"

# Revert only the behaviour-change commits (surgical)
git revert c833ab19d  # ex5 — remove is_binary guards
git revert 062633d1d  # ex9 — revert log line changes

# Revert only the gitignore change
git revert 67ad2fed4  # ex8 — remove secret file patterns

Verification Steps

# 1. Run bksw_format unit tests (all 8 must pass)
bash scripts/test-bookshelf-unit.sh verbose

# Expected last line:
#   All 8 tests passed.

# 2. Confirm is_binary guards reject non-binary inputs
erl -noshell -pa /tmp -eval '
  catch bksw_format:to_hex("string"),
  catch bksw_format:to_hex(42),
  io:format("guards ok~n"),
  init:stop()
'

# 3. Confirm req_id appears in error log format
grep -n "req_id" src/bookshelf/src/bksw_wm_sql_object.erl

# 4. Confirm no real credentials in tracked files
git grep -n "BEGIN RSA PRIVATE KEY" -- "*.erl" "*.rb"
# Expected: only itest/ and spec/ paths

# 5. Confirm gitignore does not un-track existing .pem files
git ls-files src/chef-server-ctl/spec/fixtures/pivotal.pem
git ls-files src/chef-server-ctl/habitat/config/pivotal.pem
# Both must print the path (still tracked)

Evidence

All 8 unit tests passing

======================== EUnit ========================
module 'bksw_format_tests'
  to_hex_produces_lowercase_hex_string_test...[0.003 s] ok
  to_hex_empty_binary_produces_empty_string_test...ok
  to_hex_rejects_string_input_test...ok
  to_hex_rejects_integer_input_test...ok
  to_base64_encodes_binary_test...[0.014 s] ok
  to_base64_rejects_string_input_test...ok
  to_etag_wraps_hex_in_quotes_test...ok
  to_date_undefined_returns_epoch_test...ok
  [done in 0.041 s]
=======================================================
  All 8 tests passed.

Performance baseline (key numbers)

Function Input avg latency
to_hex/1 16 B 20 µs
to_hex/1 1 024 B 1 485 µs
to_base64/1 1 024 B 23 µs

No optimization action required.

Security audit summary

  • ✅ No hardcoded production credentials
  • ✅ Bookshelf uses chef_secrets:get/2 at runtime
  • oc_chef_action redacts "password" field from audit logs
  • 🟡 4 deps branch-tracked in rebar.lock (see ai-track-docs/dependencies.md)
  • 🟡 Ruby 3.1 and Redis 5.x are EOL (noted, out of scope for this track)

Proposed Commit Message Improvements

Two commits have suboptimal messages. Proposed rewrites (for reference — not
amended in this PR since history is already pushed):

c289a28f3 — missing exercise prefix:

# Current
ghcp(crawl): ex0 bootstrap scaffolding

# Proposed
exercise 0 — ghcp(crawl): bootstrap scaffolding for Crawl track

Added ai-track-docs/ (SYSTEM-OVERVIEW, build-test, architecture.mmd)
and .copilot-track/crawl/README.md with chain-PR conventions and
evidence block template.

d83daa927 — says "exercise 2" but covers exercises 1 and 2:

# Current
exercise 2 — ghcp(crawl): ex1+ex2 repo orientation, low-risk module, deterministic tests

# Proposed
exercise 1+2 — ghcp(crawl): repo orientation, module selection, deterministic tests

- Ex1: service map, language table, entry points, test approach in SYSTEM-OVERVIEW.md
- Ex2: verified bookshelf build commands, bksw_format chosen as low-risk module,
  bksw_format_tests.erl with 5 deterministic EUnit tests (all passing)

Future exercises: the exercise N — ghcp(crawl): <verb> <what> pattern is
working well; keep subject line under 72 chars and use the body for why.


Track

  • Level: Crawl
  • Version: v2.3.1
  • Exercises completed: 0–10 of 15
  • Branch: learn/crawl/lbaker-ex0-bootstrap
  • Practice module: src/bookshelf/src/bksw_format.erl

@lbakerchef lbakerchef requested review from a team as code owners May 18, 2026 20:33
@netlify
Copy link
Copy Markdown

netlify Bot commented May 18, 2026

👷 Deploy Preview for chef-server processing.

Name Link
🔨 Latest commit a6bc5d6
🔍 Latest deploy log https://app.netlify.com/projects/chef-server/deploys/6a0b940078ef25000887411f

lbakerchef and others added 9 commits May 18, 2026 16:07
…deterministic tests

- Updated SYSTEM-OVERVIEW.md with language/framework summary, entry points,
  test approach, and chosen low-risk module (bksw_format.erl) with justification
- Updated build-test.md with exact bookshelf build/test commands, verified
  test output, and two new troubleshooting entries (GCC segfault, rebar3 path)
- Added src/bookshelf/test/bksw_format_tests.erl with 5 deterministic EUnit
  tests covering all 4 exported functions in bksw_format (to_hex, to_base64,
  to_etag, to_date); all 5 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract byte_to_hex/1 private helper from to_hex/1 list comprehension;
  intent is now readable at a glance without parsing nested format calls
- Remove no-op string:to_lower/1 call (2.16.0b format already produces lowercase)
- Add -spec attributes for all 4 public functions and byte_to_hex/1
- All 5 existing EUnit tests pass; behavior unchanged

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add @doc comments to all functions in bksw_format.erl explaining purpose,
  input shapes, edge cases, and examples
- Mark byte_to_hex/1 as @Private
- Create ai-track-docs/extending-bksw_format.md with guidance on when/how
  to add new functions, current exports reference, a worked example, and
  module health guidelines
- All 5 EUnit tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… for bksw_format

- Added is_binary/1 guards to to_hex/1 and to_base64/1
- Three new negative tests: rejects string and integer inputs
- All 8 tests passing (5 positive + 3 negative)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- timer:tc escript benchmark, 10k iterations per cell
- to_hex/1: 17–1485 µs avg across 16/128/1024 byte inputs (linear O(n))
- to_base64/1: 1–23 µs avg — stdlib base64 NIF, ~64x faster
- Variance notes: WSL2 scheduler jitter, JIT first-call spike documented
- No optimization recommended; no hotspot identified

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…kshelf

- 4 deps branch-tracked in rebar.lock (not reproducibly pinned): erlcloud,
  mini_s3, erlsom, sqerl — all on personal/feature branches
- 10 deps branch-tracked in config but ref-frozen in lock (safe today)
- Only iso8601 (tag) and meck (ref) are properly pinned in rebar.config
- OTP version mismatch: rebar.config requires 26.2.5.15, omnibus override
  has 26.2.5.14 commented out
- Platform debt noted: Ruby 3.1 EOL, Redis 5.x EOL, omnibus-ctl on main
- No files modified; audit-only

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Audit (security-hygiene.md):
- No hardcoded production credentials found in src/
- Bookshelf uses chef_secrets:get/2 for runtime secret loading (correct)
- oc_chef_action properly redacts password field from audit logs (correct)
- Habitat .pem files are template placeholders, not real keys (confirmed safe)
- AWS key in bksw_sec_tests is AKIAIOSFODNN7EXAMPLE docs placeholder (safe)

Remediations:
- .gitignore: added *.pem, *.key, .env*, *.secret, *.credentials and SSH
  key name patterns as safety net for future untracked secret files
- spec/fixtures/pivotal.pem: prepended test-only header (valid PEM format)
  to prevent false positives in secret scanners
- itest/setup_helper.erl: added TEST FIXTURE comment on zuperzecret literal

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sql_object

Three lager:error calls lacked req_id trace context despite #context{} being
in scope. These are integrity-critical events (checksum/MD5 mismatches, upload
failures) that operators need to correlate to specific requests.

Changes:
- send_streamed_body/1: add reqid=ReqId to pattern; log as key=value fields
- write_streamed_body/3: extract Ctx1#context.reqid into content-md5 mismatch log
- maybe_upload/3: promote error_logger to lager:error; add req_id to pattern;
  use key=value field format

All three now emit: req_id=<id> <event>: field=value ...

Docs: ai-track-docs/logging.md — logging stack, req_id convention, before/after,
best practices table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ation

CI finding: GitHub Actions runs security scans (trufflehog, trivy, grype,
BlackDuck, SonarQube) but build=false and unit-tests=false — Erlang unit
tests, dialyzer, and elvis are NOT executed by CI.

Local script: scripts/test-bookshelf-unit.sh
- Bypasses GCC segfault on jiffy C NIF compilation (pre-existing system bug)
- Uses ERL_LIBS + erlc to compile bksw_format + bksw_format_tests directly
- Runs eunit:test via erl -noshell with proper exit code propagation
- Cleans up temp files on exit; accepts 'verbose' argument
- All 8 tests pass: exit 0

Updated ai-track-docs/build-test.md:
- CI section rewritten with what runs vs what does not
- Local script usage documented
- Full local quality gate checklist

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lbakerchef lbakerchef changed the title GHCP — Crawl: Ex0 Bootstrap GHCP — Crawl Track v2.3.1: Exercises 0–10 (bksw_format + bookshelf hygiene) May 18, 2026
lbakerchef and others added 5 commits May 18, 2026 17:12
Updated PR title and body with:
- Review focus table: 5 items with rationale (guards, logging, PEM, gitignore, script)
- Risk tiers: behaviour-change (low), log format (low), gitignore/docs (minimal)
- Rollback: full-branch revert + surgical per-commit options
- Verification: 5 runnable shell checks with expected output
- Evidence: EUnit output, perf table, security audit summary
- Commit message improvements proposed for c289a28 and d83daa9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ings

BACKLOG-001 (High/M): Pin 4 branch-tracked rebar.lock deps to SHA refs
  — erlcloud, mini_s3, erlsom, sqerl all on personal branches in lock file

BACKLOG-002 (High/S-M): Enable Erlang unit tests in GitHub Actions CI
  — build: false / unit-tests: false; no automated test gate on PRs

BACKLOG-003 (Med/S): Propagate req_id to 14 remaining untraced error logs
  — bksw_wm_sql_object (10) + bksw_wm_object (3) + bksw_io (1)

BACKLOG-004 (Med/S): Resolve OTP version mismatch omnibus vs rebar.config
  — rebar.config requires 26.2.5.15; commented omnibus override says .14

BACKLOG-005 (Med/S): Add trufflehog/gitleaks allowlist for test PEM fixtures
  — 5 tracked .pem files will trip scanner; no machine-readable allowlist

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o_hex/1

Adds a runtime-configurable toggle for hex letter case via the bookshelf
application environment:

  application:set_env(bookshelf, hex_encoding_case, uppercase)  % -> "0AFF"
  application:set_env(bookshelf, hex_encoding_case, lowercase)  % -> "0aff" (default)

Design:
- application:get_env/3 called once per to_hex/1 (not per byte)
- string:to_upper/1 only invoked on uppercase path (zero cost default)
- Any unknown value falls through to lowercase (fail-safe)
- No restart required; takes effect on next call

Tests (2 new, 10 total — all passing):
- to_hex_uppercase_toggle_produces_uppercase_test: set env, assert, always unset
- to_hex_default_is_lowercase_when_toggle_unset_test: confirm default

Docs: ai-track-docs/feature-toggles.md — config, risks, ETag consistency warning

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Elvis: 0 violations across all 20 bookshelf src/*.erl files (9 rules).
All Ex3-13 changes to bksw_format.erl and bksw_wm_sql_object.erl pass clean.

Dialyzer on bksw_format.erl (with iso8601 dep): exit 0, no warnings.
All -spec attrs consistent with actual return types including the Ex13
string:to_upper/1 toggle path.

bksw_wm_sql_object.erl: 'Unknown functions' warnings are expected (sibling
modules not in minimal PLT) — no defects in our Ex9 code.

Pre-existing finding: mochiweb_http.erl:79 unreachable pattern in 3rd-party
mochiweb v2.x; not introduced by our changes.

Gaps documented: no Elvis/Dialyzer in CI, full PLT blocked by jiffy NIF segfault.
See ai-track-docs/static-analysis.md for full findings and run instructions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four new tests (14 total, all passing):

1. to_hex_unknown_toggle_value_falls_back_to_lowercase_test
   Proves the wildcard clause in to_hex/1's case expression fires on
   unrecognised hex_encoding_case values — bad config never crashes.

2. to_etag_empty_binary_produces_empty_quoted_string_test
   Edge case: to_etag(<<>>) produces ""  (valid ETag for empty object).

3. to_hex_large_binary_does_not_crash_test
   Boundary: 1 KiB binary (1024 bytes) encodes to exactly 2048 hex chars
   with correct content — no crash, no truncation.

4. to_date_datetime_tag_unwraps_and_formats_test
   Previously untested code path: the {datetime, Date} tagged form of
   to_date/1 was missing coverage. Verifies the unwrap clause delegates
   to iso8601 and returns a binary ISO 8601 string containing the date.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant