Haihan Jiang Haihan-Jiang

Haihan Jiang

Production / SRE / infrastructure engineer focused on reliable systems, Kubernetes/cloud operations, observability, and automation that is safe to run in production.

I like work where the result is reviewable: smaller diffs, clear failure modes, tests or gates that prove behavior, and evidence a future on-call engineer can trust.

Best fit: SRE, Production Engineering, Infrastructure, Platform, Cloud/DevOps, and backend infrastructure roles close to real operations.

Fast Proof

Verified upstream PRs merged in Google and Google Cloud Platform maintained repositories.
Recent upstream work across gVisor, syzkaller, KHI, go-containerregistry, google/benchmark, stellar-engine, and vertex-ai-creative-studio.
Built a GKE AI inference reliability lab with OpenTelemetry traces, Kubernetes manifests, incident replay, and SLO-style evidence gates.
Production context includes Meta monetization data infrastructure and SHEIN gateway infrastructure work.
Experience around production gateways, Kubernetes/AKS-style platforms, Kafka, ZooKeeper, Elasticsearch, Terraform, runbooks, dashboards, and operational automation.

Contributor Signals

Projects where my upstream PRs have been merged: google/gvisor, google/syzkaller, GoogleCloudPlatform/khi, google/go-containerregistry, google/benchmark, google/stellar-engine, and GoogleCloudPlatform/vertex-ai-creative-studio.

Selected Upstream Work

Area	Evidence
Container/runtime reliability	`google/gvisor#13276` - set swap for precreated cgroups
Kernel fuzzing / report parsing	`google/syzkaller#7420`, `google/syzkaller#7376`
Kubernetes troubleshooting	`GoogleCloudPlatform/khi#708`, `GoogleCloudPlatform/khi#692`
Container image tooling	`google/go-containerregistry#2318`
C++ build/test infrastructure	`google/benchmark#2198`, `#2199`, `#2204`
Safer cloud defaults	`google/stellar-engine#68`, `GoogleCloudPlatform/vertex-ai-creative-studio#1445`

Live searches: org:google merged PRs / org:GoogleCloudPlatform merged PRs

Featured Builds

GKE AI Inference Reliability Lab

A runnable infrastructure lab for AI inference reliability:

OpenTelemetry trace collection and Kubernetes resource context
incident replay for baseline traffic, cache-miss latency, dependency timeout, and rollout regression
SLO-style reliability gate with published evidence reports
GKE-shaped manifests for collector RBAC, PVC-backed queue storage, and sample workloads

What I Optimize For

Production changes that can be rolled out, observed, and rolled back.
Automation with explicit inputs, validation, state, side effects, and retry boundaries.
Reliability evidence: runbooks, dashboards, audit trails, tests, and incident reports.
Practical open-source changes that reduce ambiguity for maintainers and users.

Stack

Python Go C++ Java SQL Bash Linux Kubernetes AKS GKE OpenTelemetry Terraform Ansible Nginx/APISIX Kafka ZooKeeper Elasticsearch CMake pkg-config GitHub Actions

Contact

GitHub: Haihan-Jiang
Engineering profile: haihan-jiang.github.io
LinkedIn: haihan-jiang
Email: haihanj99@gmail.com

Merged PR status was verified from GitHub on 2026-06-14. I keep merged work separate from review-in-progress work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly