Skip to content

fix: stale credentials issue#216

Open
anasstahr wants to merge 8 commits intomainfrom
fix/stale-credentials-issue
Open

fix: stale credentials issue#216
anasstahr wants to merge 8 commits intomainfrom
fix/stale-credentials-issue

Conversation

@anasstahr
Copy link
Copy Markdown
Contributor

@anasstahr anasstahr commented Apr 1, 2026

Summary

Changes

  • Added ToolErrorMiddleware that wraps every tool call with a timeout and error handler, ensuring the proxy never hangs on stale credentials or unresponsive endpoints
  • Added --tool-timeout CLI flag (default: 300s) to configure the maximum duration for a tool call
  • Added credential error detection (401/403) with actionable suggestions to use --profile or aws sso login
  • Updated README troubleshooting section with guidance on authentication errors and the new --tool-timeout flag

User experience

When a user's credentials expire mid-session, they now get an actionable error message suggesting --profile or aws sso login instead of the client hanging indefinitely. The server will need to be restarted for new credential configuration to take effect.

Testing

  1. Start a session with credentials that expire in 15 minutes
  2. Make at least one successful request
  3. Wait for credentials to expire
  4. Make a new request
  5. Previously: client hangs → Now: returns an error with remediation guidance

Testing isError propagation:

import asyncio
import fastmcp
from fastmcp import Client
from mcp_proxy_for_aws.middleware.tool_error_middleware import ToolErrorMiddleware


async def main():
    """Run the isError e2e test."""
    server = fastmcp.FastMCP("test")
    server.add_middleware(ToolErrorMiddleware(tool_call_timeout=5.0))

    @server.tool()
    def boom() -> str:
        raise RuntimeError("something broke")

    client = Client(server)
    async with client:
        result = await client.call_tool("boom", {}, raise_on_error=False)
        is_error = getattr(result, "is_error", None)
        print(f"isError = {is_error}")


if __name__ == "__main__":
    asyncio.run(main())

Output:

Tool call 'boom' failed: Error calling tool 'boom': something broke.
isError = True

Checklist

  • I have reviewed the contributing guidelines
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Is this a breaking change?

  • Yes
  • No
  • Did integration tests succeed?
  • If the feature is a new use case, is it necessary to add a new integration test case?

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@anasstahr anasstahr changed the title Fix/stale credentials issue fix: stale credentials issue Apr 1, 2026
Copy link
Copy Markdown
Contributor

@arnewouters arnewouters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to run/install the pre-commit hooks :)

proxy.add_middleware(InitializeMiddleware(client_factory))
proxy.add_middleware(
ConnectionErrorMiddleware(
tool_call_timeout=args.read_timeout,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to create a new variable for tool call timeouts? Is there another place where read_timeout is used? Trying to understand if this is a new feature or a miss that we are fixing now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we will need a new variable, the read timeout and the tool timeout are 2 different things

@anasstahr anasstahr marked this pull request as ready for review April 3, 2026 08:20
@anasstahr anasstahr requested a review from a team as a code owner April 3, 2026 08:20
@anasstahr anasstahr requested review from arnewouters and kyoncal April 3, 2026 08:20
- Add optional --tool-timeout CLI flag to cap tool call duration
- Rename error_handling middleware to ToolTimeoutMiddleware
- Return graceful isError=True response instead of hanging
- Suggest long-lived credentials on 401/403 errors
- Document --tool-timeout in README troubleshooting
@anasstahr anasstahr force-pushed the fix/stale-credentials-issue branch from d7280aa to 092b144 Compare April 3, 2026 12:37
)

parser.add_argument(
'--tool-error-timeout',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'--tool-error-timeout',
'--tool-timeout',

logger = logging.getLogger(__name__)


class _FailedToolResult(ToolResult):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to define a new class?

Copy link
Copy Markdown
Contributor Author

@anasstahr anasstahr Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes
fastmcp's ToolResult doesn't expose isError
Subclassing and overriding to_mcp_result() is the only way to set isError=True



  # ToolResult.__init__ — no isError parameter
  def __init__(
      self,
      content: list[ContentBlock] | Any | None = None,
      structured_content: dict[str, Any] | Any | None = None,
      meta: dict[str, Any] | None = None,
  ):

  # ToolResult.to_mcp_result() — never sets isError
  def to_mcp_result(self):
      if self.meta is not None:
          return CallToolResult(content=self.content, _meta=self.meta)  # no isError
      if self.structured_content is None:
          return self.content  # bare list, not even a CallToolResult
      return self.content, self.structured_content  # tuple, not a CallToolResult

Copy link
Copy Markdown
Contributor

@arnewouters arnewouters Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can raise a ToolError instead of creating a new class and returning that. It should have isError set.

    async def test_specific_tool_errors_are_sent_to_client(self):
        mcp = FastMCP("TestServer")

        @mcp.tool
        def custom_error_tool():
            raise ToolError("This is a test error (abc)")

        client = Client(transport=FastMCPTransport(mcp))

        async with client:
            result = await client.call_tool_mcp("custom_error_tool", {})
            assert result.isError
            assert isinstance(result.content[0], TextContent)
            assert "test error" in result.content[0].text
            assert "abc" in result.content[0].text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants