Skip to content

multiple: collect host-level journal entries for better debugging#2297

Open
sespiros wants to merge 6 commits intomainfrom
sse/gather-host-logs
Open

multiple: collect host-level journal entries for better debugging#2297
sespiros wants to merge 6 commits intomainfrom
sse/gather-host-logs

Conversation

@sespiros
Copy link
Copy Markdown
Collaborator

@sespiros sespiros commented Apr 3, 2026

This PR adds:

  • host-level log collection to the log-collector
  • a new just download-logs target for downloading logs
  • debugging documentation

This depends on #2290

@sespiros sespiros added no changelog PRs not listed in the release notes do not merge This shouldn't be merged at this point labels Apr 3, 2026
@sespiros sespiros force-pushed the sse/gather-host-logs branch from 197f1a9 to 08ddcc9 Compare April 3, 2026 15:45
@sespiros sespiros requested a review from burgerdev April 7, 2026 08:06
@sespiros sespiros marked this pull request as ready for review April 7, 2026 08:07
echo "Collecting kernel logs (since $since)..." >&2
journalctl --directory=/journal -k --since="$since" --no-pager >/export/host/kernel.log 2>&1 || true
echo "Collecting k3s logs (since $since)..." >&2
journalctl --directory=/journal -u k3s --since="$since" --no-pager >/export/host/k3s.log 2>&1 || true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work on non-k3s runners, we'll need kubelet.service and containerd.service there.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added collection for both and added a find step which will skip the empty ones.

Comment on lines +104 to +108
mv ./workspace/logs/export-no-stream/logs/* ./workspace/logs/
if [[ -d ./workspace/logs/export-no-stream/host ]]; then
mkdir -p ./workspace/logs/host
mv ./workspace/logs/export-no-stream/host/* ./workspace/logs/host/
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just move the export-no-stream directory altogether?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, changed the path where I store the host logs so that the mv can stay as is.

Comment on lines +78 to +83
download)
deploy=false
if [[ ${2:-} == "--deploy" ]]; then
deploy=true
shift
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in CI and (now also in e2e) the log-collector pods will be deployed right from the start. To enable using the download-logs command even in a non-e2e or non CI deployment i.e plain just, I wanted to have a way to tell the download script to also deploy the log-collector but I opted for removing this flag and doing this automatically instead, in download.

Comment on lines +65 to +68
## Tracing a pod to its sandbox in kata.log

kata.log contains interleaved logs from all sandboxes. To find logs for a
specific pod, you need to go from runtime class to sandbox ID.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually try to make the pod<->sandbox association traceable, but it's good to have at least some hints for now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new commit for this, ptal.

sespiros added 5 commits April 8, 2026 19:18
Add host-level log collection to the get-logs download phase.
The log-collector DaemonSet mounts /var/log/journal from the host,
and a separate collect-host-logs script runs journalctl to extract
time-scoped entries (since namespace creation) for:
- kernel (-k): SEV-ES termination, VFIO/IOMMU events
- kata (-t kata): QEMU lifecycle, register dumps, console output
- k3s/kubelet/containerd (-u k3s/kubelet/containerd): k3s/kubelet/containerd errors, CRI state

Split the k8s-log-collector package into two scripts:
- collect-pod-logs: streams pod container logs (DaemonSet entrypoint)
- collect-host-logs: one-shot journal export (called via kubectl exec)

Uses systemdMinimal with compression enabled to support host
journal files compressed with LZ4/ZSTD.

Adds image replacement logic so get-logs deploys the freshly-pushed
log-collector image instead of the hardcoded release SHA.

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
Add k8s-log-collector to e2e push targets. Integrate get-logs start
into _e2e recipe to deploy the log-collector DaemonSet early so pod
logs are captured from the start.

Simplify CI e2e workflow: just e2e already handles the streaming
log-collector deployment via _e2e, so remove the manual get-logs
start/download steps and use just download-logs instead.

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
Add a download-logs recipe that downloads all logs (pod + host journal)
from an existing deployment.

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
Document how to collect and navigate logs after e2e tests and manual
deployments. Covers host-level journal logs (kernel, k3s, kata) and
how to trace a pod to its kata sandbox ID in the logs.

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
@sespiros sespiros force-pushed the sse/gather-host-logs branch from 08ddcc9 to 2581649 Compare April 8, 2026 16:22
Mount the CRI socket (k3s and standard containerd) in the
log-collector pod and run crictl pods to map CVM pod names
to kata sandbox IDs. Writes metadata/sandbox-map.txt for
easy triaging of kata.log.
@sespiros sespiros force-pushed the sse/gather-host-logs branch from 2581649 to 08df97c Compare April 8, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge This shouldn't be merged at this point no changelog PRs not listed in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants