Conversation
There was a problem hiding this comment.
Pull request overview
Adds higher-level integration and end-to-end coverage for Redis-backed vMCP session sharing, focusing on cross-pod session restoration and related lifecycle behaviors.
Changes:
- Added Go integration tests for
sessionmanagerusingminiredis+ in-process MCP backends to validate restore, hijack prevention, backend expiry metadata, and in-memory-only behavior. - Added operator E2E tests (Kind + real Redis) to validate VirtualMCPServer readiness/conditions with Redis and to exercise cross-pod session reuse via
transport.WithSession(sessionID).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
pkg/vmcp/server/sessionmanager/horizontal_scaling_integration_test.go |
New integration tests covering cross-pod restore, hijack prevention, restore failure behavior, backend expiry metadata, and no-Redis isolation. |
test/e2e/thv-operator/virtualmcp/virtualmcp_redis_session_test.go |
New E2E coverage for VirtualMCPServer replicas=2 with Redis: condition assertions and cross-pod session reconstruction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/vmcp/server/sessionmanager/horizontal_scaling_integration_test.go
Outdated
Show resolved
Hide resolved
pkg/vmcp/server/sessionmanager/horizontal_scaling_integration_test.go
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4724 +/- ##
==========================================
- Coverage 68.97% 68.91% -0.07%
==========================================
Files 517 517
Lines 54798 54862 +64
==========================================
+ Hits 37799 37806 +7
- Misses 14087 14147 +60
+ Partials 2912 2909 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jerm-dro
left a comment
There was a problem hiding this comment.
LGTM, thanks for writing!
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Large PR justification has been provided. Thank you!
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add integration and e2e tests for Redis-backed vMCP session sharing
Integration tests (pkg/vmcp/server/sessionmanager, miniredis):
- AC1: cross-pod session reconstruction via RestoreSession on cache miss
- AC2: hijack prevention — wrong/nil callers rejected after restore,
correct caller routes successfully
- AC4: all backends unreachable on restore → empty routing table returned
- AC5: NotifyBackendExpired removes backend from Redis metadata;
subsequent restore on a fresh pod skips the expired backend
- AC6: LocalSessionDataStorage (no Redis) → no cross-pod sharing
E2E tests (test/e2e/thv-operator/virtualmcp, Kind + real Redis):
- replicas=2 + Redis → SessionStorageWarning=False and Ready=True
- cross-pod session reconstruction: session initialized on pod A is
usable on pod B via transport.WithSession(sessionID)
Note: AC3 (LRU eviction, RC-10) is intentionally excluded — TTL-based
Redis eviction is sufficient.
Closes #4222
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/e2e/thv-operator/virtualmcp/virtualmcp_redis_session_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Evict if the backend ID list has drifted (e.g. NotifyBackendExpired removed a | ||
| // backend), so the next Get calls RestoreSession with the updated backend list. | ||
| // | ||
| // We intentionally compare only MetadataKeyBackendIDs rather than the full | ||
| // metadata map. Per-backend session IDs (MetadataKeyBackendSessionPrefix+*) | ||
| // are the session IDs negotiated by the restoring pod's backend connections. | ||
| // RestoreSession sends the stored IDs as Mcp-Session-Id hints, so a backend | ||
| // that honors session resumption will return the same ID — but not all backends | ||
| // do (e.g. SSE transports have no session ID at all). Comparing the full map | ||
| // would evict on every cross-pod cache hit whenever any backend assigns a | ||
| // fresh ID, preventing tools from ever being served. | ||
| sessBackendIDs := sess.GetMetadata()[vmcpsession.MetadataKeyBackendIDs] | ||
| if sessBackendIDs != metadata[vmcpsession.MetadataKeyBackendIDs] { | ||
| return cache.ErrExpired | ||
| } |
There was a problem hiding this comment.
suggestion: This narrowing from maps.Equal to comparing only MetadataKeyBackendIDs is a workaround for a deeper issue: loadSession never writes the restored session's metadata back to Redis after RestoreSession. If a backend doesn't honor the session hint (e.g., SSE transports, or a backend that ignores Mcp-Session-Id), the per-backend session IDs in Redis are permanently stale — every subsequent pod restoration reads the same outdated hints, and the full-map comparison triggers infinite eviction loops because the restored IDs always diverge from Redis.
Adding storage.Update(sessionID, restored.GetMetadata()) after a successful restore in loadSession would keep Redis consistent and should allow reverting this comparison back to maps.Equal, which gives stronger cross-pod consistency (detecting any metadata drift, not just backend list changes).
Summary
Add integration and e2e tests for Redis-backed vMCP session sharing
Integration tests (pkg/vmcp/server/sessionmanager, miniredis):
E2E tests (test/e2e/thv-operator/virtualmcp, Kind + real Redis):
Note: AC3 (LRU eviction, RC-10) is intentionally excluded — TTL-based Redis eviction is sufficient.
Runtime fix (
pkg/vmcp/k8s/manager.go):SetupIndexesis now called beforeSetupWithManagerwhen registering the BackendReconciler. The reconciler's watch handlers depend on field indexes registered against the controller-manager's cache; callingSetupWithManagerwithout them causes watch predicates to fail silently (or panic at first use). This bug was surfaced by the new e2e tests, which exercise the multi-replica path that triggers the reconciler.Fixes #4222
Type of change
Test plan
task test)task test-e2e)task lint-fix)Large PR Justification
This is a complete PR including e2e tests. Cannot be split.