multiple: collect host-level journal entries for better debugging#2297
multiple: collect host-level journal entries for better debugging#2297
Conversation
197f1a9 to
08ddcc9
Compare
| echo "Collecting kernel logs (since $since)..." >&2 | ||
| journalctl --directory=/journal -k --since="$since" --no-pager >/export/host/kernel.log 2>&1 || true | ||
| echo "Collecting k3s logs (since $since)..." >&2 | ||
| journalctl --directory=/journal -u k3s --since="$since" --no-pager >/export/host/k3s.log 2>&1 || true |
There was a problem hiding this comment.
This won't work on non-k3s runners, we'll need kubelet.service and containerd.service there.
There was a problem hiding this comment.
Added collection for both and added a find step which will skip the empty ones.
| mv ./workspace/logs/export-no-stream/logs/* ./workspace/logs/ | ||
| if [[ -d ./workspace/logs/export-no-stream/host ]]; then | ||
| mkdir -p ./workspace/logs/host | ||
| mv ./workspace/logs/export-no-stream/host/* ./workspace/logs/host/ | ||
| fi |
There was a problem hiding this comment.
Should we just move the export-no-stream directory altogether?
There was a problem hiding this comment.
Fixed this, changed the path where I store the host logs so that the mv can stay as is.
| download) | ||
| deploy=false | ||
| if [[ ${2:-} == "--deploy" ]]; then | ||
| deploy=true | ||
| shift | ||
| fi |
There was a problem hiding this comment.
So in CI and (now also in e2e) the log-collector pods will be deployed right from the start. To enable using the download-logs command even in a non-e2e or non CI deployment i.e plain just, I wanted to have a way to tell the download script to also deploy the log-collector but I opted for removing this flag and doing this automatically instead, in download.
dev-docs/e2e/debugging.md
Outdated
| ## Tracing a pod to its sandbox in kata.log | ||
|
|
||
| kata.log contains interleaved logs from all sandboxes. To find logs for a | ||
| specific pod, you need to go from runtime class to sandbox ID. |
There was a problem hiding this comment.
We should eventually try to make the pod<->sandbox association traceable, but it's good to have at least some hints for now.
There was a problem hiding this comment.
I pushed a new commit for this, ptal.
Add host-level log collection to the get-logs download phase. The log-collector DaemonSet mounts /var/log/journal from the host, and a separate collect-host-logs script runs journalctl to extract time-scoped entries (since namespace creation) for: - kernel (-k): SEV-ES termination, VFIO/IOMMU events - kata (-t kata): QEMU lifecycle, register dumps, console output - k3s/kubelet/containerd (-u k3s/kubelet/containerd): k3s/kubelet/containerd errors, CRI state Split the k8s-log-collector package into two scripts: - collect-pod-logs: streams pod container logs (DaemonSet entrypoint) - collect-host-logs: one-shot journal export (called via kubectl exec) Uses systemdMinimal with compression enabled to support host journal files compressed with LZ4/ZSTD. Adds image replacement logic so get-logs deploys the freshly-pushed log-collector image instead of the hardcoded release SHA. Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
Add k8s-log-collector to e2e push targets. Integrate get-logs start into _e2e recipe to deploy the log-collector DaemonSet early so pod logs are captured from the start. Simplify CI e2e workflow: just e2e already handles the streaming log-collector deployment via _e2e, so remove the manual get-logs start/download steps and use just download-logs instead. Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
Add a download-logs recipe that downloads all logs (pod + host journal) from an existing deployment. Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
Document how to collect and navigate logs after e2e tests and manual deployments. Covers host-level journal logs (kernel, k3s, kata) and how to trace a pod to its kata sandbox ID in the logs. Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
08ddcc9 to
2581649
Compare
Mount the CRI socket (k3s and standard containerd) in the log-collector pod and run crictl pods to map CVM pod names to kata sandbox IDs. Writes metadata/sandbox-map.txt for easy triaging of kata.log.
2581649 to
08df97c
Compare
This PR adds:
just download-logstarget for downloading logsThis depends on #2290