Skip to content

Add horizontal scaling tests for vmcp#4724

Open
yrobla wants to merge 6 commits intomainfrom
issue-4222
Open

Add horizontal scaling tests for vmcp#4724
yrobla wants to merge 6 commits intomainfrom
issue-4222

Conversation

@yrobla
Copy link
Copy Markdown
Contributor

@yrobla yrobla commented Apr 10, 2026

Summary

Add integration and e2e tests for Redis-backed vMCP session sharing

Integration tests (pkg/vmcp/server/sessionmanager, miniredis):

  • AC1: cross-pod session reconstruction via RestoreSession on cache miss
  • AC2: hijack prevention — wrong/nil callers rejected after restore, correct caller routes successfully
  • AC4: all backends unreachable on restore → empty routing table returned
  • AC5: NotifyBackendExpired removes backend from Redis metadata; subsequent restore on a fresh pod skips the expired backend
  • AC6: LocalSessionDataStorage (no Redis) → no cross-pod sharing

E2E tests (test/e2e/thv-operator/virtualmcp, Kind + real Redis):

  • replicas=2 + Redis → SessionStorageWarning=False and Ready=True
  • cross-pod session reconstruction: session initialized on pod A is usable on pod B via transport.WithSession(sessionID)

Note: AC3 (LRU eviction, RC-10) is intentionally excluded — TTL-based Redis eviction is sufficient.

Runtime fix (pkg/vmcp/k8s/manager.go): SetupIndexes is now called before SetupWithManager when registering the BackendReconciler. The reconciler's watch handlers depend on field indexes registered against the controller-manager's cache; calling SetupWithManager without them causes watch predicates to fail silently (or panic at first use). This bug was surfaced by the new e2e tests, which exercise the multi-replica path that triggers the reconciler.

Fixes #4222

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Large PR Justification

This is a complete PR including e2e tests. Cannot be split.

@yrobla yrobla requested a review from Copilot April 10, 2026 05:19
@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Apr 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds higher-level integration and end-to-end coverage for Redis-backed vMCP session sharing, focusing on cross-pod session restoration and related lifecycle behaviors.

Changes:

  • Added Go integration tests for sessionmanager using miniredis + in-process MCP backends to validate restore, hijack prevention, backend expiry metadata, and in-memory-only behavior.
  • Added operator E2E tests (Kind + real Redis) to validate VirtualMCPServer readiness/conditions with Redis and to exercise cross-pod session reuse via transport.WithSession(sessionID).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/vmcp/server/sessionmanager/horizontal_scaling_integration_test.go New integration tests covering cross-pod restore, hijack prevention, restore failure behavior, backend expiry metadata, and no-Redis isolation.
test/e2e/thv-operator/virtualmcp/virtualmcp_redis_session_test.go New E2E coverage for VirtualMCPServer replicas=2 with Redis: condition assertions and cross-pod session reconstruction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 69.56522% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.91%. Comparing base (32c23f4) to head (1ca8298).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/vmcp/server/server.go 50.00% 9 Missing and 2 partials ⚠️
cmd/vmcp/app/commands.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4724      +/-   ##
==========================================
- Coverage   68.97%   68.91%   -0.07%     
==========================================
  Files         517      517              
  Lines       54798    54862      +64     
==========================================
+ Hits        37799    37806       +7     
- Misses      14087    14147      +60     
+ Partials     2912     2909       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Apr 10, 2026
@yrobla yrobla requested a review from Copilot April 10, 2026 05:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jerm-dro
jerm-dro previously approved these changes Apr 10, 2026
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for writing!

@yrobla yrobla requested a review from blkt as a code owner April 13, 2026 08:40
@yrobla yrobla changed the base branch from main to revert-4730-issue-4494 April 13, 2026 08:40
@yrobla yrobla requested a review from Copilot April 13, 2026 08:41
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 13, 2026
@github-actions github-actions bot dismissed their stale review April 13, 2026 08:42

Large PR justification has been provided. Thank you!

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 13, 2026
@yrobla yrobla requested a review from Copilot April 13, 2026 11:29
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 13, 2026
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

taskbot added 4 commits April 14, 2026 09:31
…pattern, add storage Update" (#4730)"

This reverts commit acc1f66.
Add integration and e2e tests for Redis-backed vMCP session sharing

Integration tests (pkg/vmcp/server/sessionmanager, miniredis):
- AC1: cross-pod session reconstruction via RestoreSession on cache miss
- AC2: hijack prevention — wrong/nil callers rejected after restore,
       correct caller routes successfully
- AC4: all backends unreachable on restore → empty routing table returned
- AC5: NotifyBackendExpired removes backend from Redis metadata;
       subsequent restore on a fresh pod skips the expired backend
- AC6: LocalSessionDataStorage (no Redis) → no cross-pod sharing

E2E tests (test/e2e/thv-operator/virtualmcp, Kind + real Redis):
- replicas=2 + Redis → SessionStorageWarning=False and Ready=True
- cross-pod session reconstruction: session initialized on pod A is
  usable on pod B via transport.WithSession(sessionID)

Note: AC3 (LRU eviction, RC-10) is intentionally excluded — TTL-based
Redis eviction is sufficient.

Closes #4222
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +656 to 670
// Evict if the backend ID list has drifted (e.g. NotifyBackendExpired removed a
// backend), so the next Get calls RestoreSession with the updated backend list.
//
// We intentionally compare only MetadataKeyBackendIDs rather than the full
// metadata map. Per-backend session IDs (MetadataKeyBackendSessionPrefix+*)
// are the session IDs negotiated by the restoring pod's backend connections.
// RestoreSession sends the stored IDs as Mcp-Session-Id hints, so a backend
// that honors session resumption will return the same ID — but not all backends
// do (e.g. SSE transports have no session ID at all). Comparing the full map
// would evict on every cross-pod cache hit whenever any backend assigns a
// fresh ID, preventing tools from ever being served.
sessBackendIDs := sess.GetMetadata()[vmcpsession.MetadataKeyBackendIDs]
if sessBackendIDs != metadata[vmcpsession.MetadataKeyBackendIDs] {
return cache.ErrExpired
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This narrowing from maps.Equal to comparing only MetadataKeyBackendIDs is a workaround for a deeper issue: loadSession never writes the restored session's metadata back to Redis after RestoreSession. If a backend doesn't honor the session hint (e.g., SSE transports, or a backend that ignores Mcp-Session-Id), the per-backend session IDs in Redis are permanently stale — every subsequent pod restoration reads the same outdated hints, and the full-map comparison triggers infinite eviction loops because the restored IDs always diverge from Redis.

Adding storage.Update(sessionID, restored.GetMetadata()) after a successful restore in loadSession would keep Redis consistent and should allow reverting this comparison back to maps.Equal, which gives stronger cross-pod consistency (detecting any metadata drift, not just backend list changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration tests for vMCP horizontal scaling (THV-0047)

4 participants