Skip to content

feat(db): add AWS RDS IAM authentication support#563

Open
fossouo wants to merge 5 commits intoplastic-labs:mainfrom
fossouo:feature/aws-rds-iam-authentication
Open

feat(db): add AWS RDS IAM authentication support#563
fossouo wants to merge 5 commits intoplastic-labs:mainfrom
fossouo:feature/aws-rds-iam-authentication

Conversation

@fossouo
Copy link
Copy Markdown

@fossouo fossouo commented Apr 14, 2026

Summary

Add support for connecting Honcho to AWS RDS PostgreSQL using IAM authentication tokens instead of static database passwords. This enables operators to leverage AWS IAM policies for database access control, eliminating the need for long-lived credentials.

Changes

  • Configuration: Add AUTH_METHOD, AWS_REGION, RDS_HOSTNAME, RDS_PORT, RDS_USERNAME, AWS_PROFILE, and RDS_SSL_CA_BUNDLE fields to DBSettings with validation
  • AWS Auth Module: New src/aws_auth.py with generate_rds_auth_token() using boto3, with descriptive error wrapping
  • Database Engine: SQLAlchemy do_connect event listener for per-connection IAM token injection, SSL enforcement, pool_pre_ping and pool_recycle tuning
  • Migration Support: Alembic migrations work with IAM auth (fresh token + SSL for each migration run)
  • MCP Tool: New aws_rds_status tool for diagnosing RDS connectivity from AI workflows
  • Docker: Dockerfile downloads AWS RDS CA certificate bundle; docker-compose.yml.example includes IAM config section
  • Tests: Unit tests + property-based tests (hypothesis for Python, fast-check for TypeScript) covering 15 correctness properties

Backward Compatibility

When DB_AUTH_METHOD=password (the default), behavior is identical to the current implementation. No changes required for existing deployments.

Tested

  • Unit tests and property-based tests all pass
  • Live tested against an AWS RDS PostgreSQL 15 instance with IAM authentication enabled

Summary by CodeRabbit

  • New Features

    • Added AWS IAM authentication support for RDS PostgreSQL databases as an alternative to password-based authentication.
    • Configuration now supports IAM mode with AWS region, hostname, port, and username parameters.
    • Automatic RDS SSL CA bundle integration for secure IAM-authenticated connections.
  • Tests

    • Comprehensive test coverage for AWS IAM authentication functionality.
  • Chores

    • Added AWS SDK dependency.
    • Updated Docker build to include RDS CA certificate bundle.
    • Updated configuration templates with IAM authentication examples.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Walkthrough

The pull request introduces AWS RDS IAM authentication support by adding a new authentication mode to the database configuration layer. It includes a boto3-based token generator, extended database settings with IAM-specific fields, IAM-aware connection setup with event listeners, Docker/configuration templates documenting the feature, and comprehensive test coverage. AWS RDS CA bundle is fetched during Docker build.

Changes

Cohort / File(s) Summary
Configuration & Documentation
.env.template, config.toml.example, docker-compose.yml.example, .gitignore
Added commented examples and documentation for IAM authentication configuration; added .kiro/ to gitignore.
Dependencies
pyproject.toml
Added boto3>=1.42.5 as a runtime dependency.
Docker
Dockerfile
Inserted build step to fetch AWS RDS global CA bundle (global-bundle.pem) with validation that the downloaded file is non-empty.
AWS Authentication Module
src/aws_auth.py
New module implementing generate_rds_auth_token function to produce short-lived RDS IAM tokens via boto3, with structured exception handling for missing profiles, credentials, client errors, and endpoint issues.
Database Configuration
src/config.py
Extended DBSettings with AUTH_METHOD field (defaulting to "password"), AWS/RDS parameters (AWS_REGION, RDS_HOSTNAME, RDS_USERNAME, RDS_PORT, AWS_PROFILE, RDS_SSL_CA_BUNDLE), and post-validation hook _require_iam_fields enforcing required fields when IAM mode is selected.
Database Connection Logic
src/db.py
Added IAM-aware URI reconstruction using RDS host/port/username, SSL configuration via connect_args, adjusted connection pooling (forced pool_pre_ping and limited pool_recycle to 900), SQLAlchemy event listener for token regeneration on each connection, and Alembic migration support with token injection and URL encoding.
Test Suite
tests/aws_auth/__init__.py, tests/aws_auth/test_aws_rds_auth.py
Created new test package with comprehensive unit tests validating Pydantic configuration behavior, generate_rds_auth_token function with various error scenarios, and database engine setup assertions for IAM vs password modes.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant DB as SQLAlchemy Engine
    participant Listener as do_connect Listener
    participant AWSAuth as generate_rds_auth_token
    participant Boto3 as boto3 Session
    participant RDS as AWS RDS

    App->>DB: Initialize connection pool
    App->>DB: Request connection
    DB->>Listener: Emit do_connect event
    Listener->>AWSAuth: Call generate_rds_auth_token()
    AWSAuth->>Boto3: Create Session (with optional profile)
    Boto3->>RDS: Call generate_db_auth_token()
    RDS-->>Boto3: Return short-lived token
    Boto3-->>AWSAuth: Return token
    AWSAuth-->>Listener: Return token
    Listener->>DB: Inject token into password field
    DB->>RDS: Connect using IAM token + SSL
    RDS-->>DB: Connection established
    DB-->>App: Provide connection
Loading
sequenceDiagram
    participant Alembic as Alembic Migration
    participant InitDB as init_db() Function
    participant AWSAuth as generate_rds_auth_token
    participant Boto3 as boto3 Session
    participant RDS as AWS RDS

    InitDB->>InitDB: Check AUTH_METHOD == "iam"
    InitDB->>AWSAuth: Generate migration token
    AWSAuth->>Boto3: Create Session
    Boto3->>RDS: generate_db_auth_token()
    RDS-->>Boto3: Return token
    Boto3-->>AWSAuth: Return token
    AWSAuth-->>InitDB: Return token
    InitDB->>InitDB: URL-encode token
    InitDB->>InitDB: Rebuild sqlalchemy.url with RDS host/port/username/token
    InitDB->>InitDB: Add SSL query parameters (sslmode, sslrootcert)
    InitDB->>Alembic: Provide configured URL
    Alembic->>RDS: Execute migrations over IAM-authenticated connection
    RDS-->>Alembic: Migrations applied
    Alembic-->>InitDB: Complete
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hops through AWS clouds with glee,
IAM tokens dance so free,
No passwords stored, just boto3 cheer,
RDS secured, crystal clear!
CA bundles fetched, connections shine,
Short-lived auth—perfectly fine! 🔐✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(db): add AWS RDS IAM authentication support' clearly and concisely summarizes the main objective of the changeset—adding IAM authentication support for AWS RDS connections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (5)
mcp/src/tools/aws-status.pbt.test.ts (1)

60-87: Property coverage misses “field omitted” cases.

Line 81–87 always generates a full object, so the Line 153–154 “or null if not provided” path is never exercised. Consider switching to partial-record payload generation (or fc.option for individual keys) so null/omission behavior is actually tested.

Also applies to: 153-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcp/src/tools/aws-status.pbt.test.ts` around lines 60 - 87, The
property-based test currently builds a full object with arbHealthResponse (using
arbAuthMethod, arbHostname, arbPort, arbRegion, arbStatus) so the code-path that
expects keys to be omitted or null is never exercised; change arbHealthResponse
to generate partial/optional keys (e.g., use fc.partial({...}) so fields can be
omitted, or wrap individual fields with fc.option(...) to produce
null/undefined) so the test will sometimes omit
rds_hostname/rds_port/aws_region/auth_method and exercise the "or null if not
provided" branches.
tests/aws_auth/test_aws_rds_pbt.py (2)

235-250: Remove unused st.data() parameter.

The data parameter from st.data() is never used in this test. Since the test doesn't need any additional generated data, the decorator can be simplified.

♻️ Proposed fix
     `@settings`(max_examples=100)
-    `@given`(data=st.data())
+    `@given`(st.just(None))  # or remove `@given` entirely if no property input needed
     `@patch`("src.aws_auth.boto3.Session")
-    def test_no_credentials_error_is_descriptive(self, mock_session_cls, data):
+    def test_no_credentials_error_is_descriptive(self, mock_session_cls, _):
         # Feature: aws-mcp-postgres, Property 4: Token generation errors are descriptive

Or simply remove the @given decorator if no property-based input is needed:

     `@settings`(max_examples=100)
-    `@given`(data=st.data())
     `@patch`("src.aws_auth.boto3.Session")
-    def test_no_credentials_error_is_descriptive(self, mock_session_cls, data):
+    def test_no_credentials_error_is_descriptive(self, mock_session_cls):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/aws_auth/test_aws_rds_pbt.py` around lines 235 - 250, The test
test_no_credentials_error_is_descriptive declares an unused property-based input
by accepting data=st.data() via the `@given` decorator; remove the unused
st.data() parameter and the corresponding "data" argument from the test
signature, and either drop the `@given` decorator entirely or replace it with
`@settings`(max_examples=100) only so the test uses the patched boto3.Session and
MagicMock without expecting Hypothesis input (refer to
test_no_credentials_error_is_descriptive, st.data(), and the `@given` decorator).

614-617: Remove unnecessary f-string prefix.

The string literal has no placeholders, so the f prefix is unnecessary.

♻️ Proposed fix
         else:
             mock_client.generate_db_auth_token.side_effect = EndpointConnectionError(
-                endpoint_url=f"https://rds.us-east-1.amazonaws.com"
+                endpoint_url="https://rds.us-east-1.amazonaws.com"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/aws_auth/test_aws_rds_pbt.py` around lines 614 - 617, The
EndpointConnectionError instantiation uses an unnecessary f-string for a plain
literal; update the test so mock_client.generate_db_auth_token.side_effect =
EndpointConnectionError(endpoint_url="https://rds.us-east-1.amazonaws.com")
(remove the leading f) to avoid the redundant f-string while keeping the same
endpoint_url value referenced in the mock_client.generate_db_auth_token setup.
src/db.py (2)

152-158: Consider extracting database name extraction to a helper function.

The database name extraction logic is duplicated (also at lines 26-27). A small helper function would improve maintainability.

♻️ Proposed helper function
def _extract_db_name(connection_uri: str) -> str:
    """Extract database name from a PostgreSQL connection URI."""
    db_part = connection_uri.rsplit("/", 1)[-1] if "/" in connection_uri else "postgres"
    return db_part.split("?")[0] or "postgres"

Then use _extract_db_name(settings.DB.CONNECTION_URI) in both locations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db.py` around lines 152 - 158, Extract the duplicated DB name parsing
into a helper called _extract_db_name(connection_uri: str) that implements the
existing logic (use connection_uri.rsplit("/", 1)[-1] if "/" in connection_uri
else "postgres", then split on "?" and fallback to "postgres"); replace both
places that compute db_name from settings.DB.CONNECTION_URI with a call to
_extract_db_name(settings.DB.CONNECTION_URI) so the parsing logic is centralized
and easy to reuse (update any references to db_name and remove the duplicated
parsing snippets).

139-147: Consider catching specific exception types.

The bare except Exception catches all exceptions including keyboard interrupts. Since generate_rds_auth_token raises RuntimeError for AWS-related failures, catching RuntimeError specifically would be more precise while still logging the context.

♻️ Proposed fix
         try:
             token = generate_rds_auth_token(
                 region=settings.DB.AWS_REGION,
                 hostname=settings.DB.RDS_HOSTNAME,
                 port=settings.DB.RDS_PORT,
                 username=settings.DB.RDS_USERNAME,
                 profile=settings.DB.AWS_PROFILE,
             )
-        except Exception:
+        except RuntimeError:
             logger.error(
                 "Failed to generate IAM auth token for Alembic migration "
                 "(region=%s, hostname=%s, username=%s)",
                 settings.DB.AWS_REGION,
                 settings.DB.RDS_HOSTNAME,
                 settings.DB.RDS_USERNAME,
             )
             raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db.py` around lines 139 - 147, Change the bare except in the Alembic IAM
auth token generation to catch RuntimeError instead of Exception to avoid
swallowing unrelated interrupts; specifically, update the except block around
generate_rds_auth_token (the block that currently calls logger.error with
settings.DB.AWS_REGION, settings.DB.RDS_HOSTNAME, settings.DB.RDS_USERNAME) to
"except RuntimeError:" (log the same contextual message via logger.error and
then re-raise) so AWS-related failures are handled explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-compose.yml.example`:
- Around line 32-43: The commented AWS RDS IAM env vars create a second
environment key if uncommented; update the docker-compose.yml.example to avoid
duplicate keys by moving those commented DB_* and CACHE_* entries into the
existing service "environment" block (or change the comment to instruct users to
replace the entire environment block rather than uncommenting), and clarify
around the existing env_file/environment keys so users only enable variables
within the single environment section (reference the env_file line and the
existing environment block in the service).

In `@Dockerfile`:
- Around line 41-44: The Dockerfile step that downloads global-bundle.pem
currently fetches it without integrity checks or retries; update the RUN that
creates /usr/local/share/aws and downloads global-bundle.pem to (1) accept or
set a known SHA256 checksum (via a build ARG or hardcoded variable), (2) use a
downloader with retry/timeout flags (curl or wget with --retry/--tries, --fail,
and reasonable timeouts) and (3) immediately verify the file with sha256sum (or
shasum) and fail the build if the checksum does not match, so the image build
aborts on mismatch; reference the existing RUN that writes
/usr/local/share/aws/global-bundle.pem and ensure the command returns non-zero
on download or verification failure.

In `@mcp/src/tools/aws-status.test.ts`:
- Around line 71-102: The test is mocking and asserting fields (auth_method,
rds_hostname, rds_port, aws_region) that the real /health endpoint does not
return; update the test in aws-status.test.ts to reflect the actual /health
contract by changing the mocked healthPayload and assertions to only
include/expect the real fields (e.g., status) or else adjust the
register/getHandler logic to transform /health into the extended shape;
specifically modify the vi.stubGlobal mockResolvedValue in the "returns healthy
response with all required fields" case (and the other affected cases around
lines 139-163 and 165-188) and the subsequent parsed assertions so they either
assert only on status (and connection_healthy/error as produced by your handler)
or mock the handler's upstream response shape that register/getHandler expects.

In `@mcp/src/tools/aws-status.ts`:
- Around line 18-23: The aws_rds_status tool is assuming /health returns
auth_method, rds_hostname, rds_port and aws_region but the backend /health only
returns { "status": "ok" }; update aws_rds_status to either call the correct
diagnostics endpoint that provides DB/IAM info (replace the /health fetch with
that endpoint) or modify aws_rds_status to align with the actual payload by
removing references to auth_method/rds_hostname/rds_port/aws_region, updating
the description text to match the real contract, and defensively handling
missing fields (e.g., return null/unknown or a clear error message when those
properties are absent). Ensure changes target the aws_rds_status function and
the description string so the code and docs remain consistent.

In `@src/aws_auth.py`:
- Around line 54-100: Replace the generic RuntimeError raises in the exception
handlers for NoCredentialsError, ClientError, and EndpointConnectionError with
the project-specific exceptions from src/exceptions.py: raise
AuthenticationException for credential/access failures (NoCredentialsError and
AccessDenied/Forbidden ClientError paths) and raise ValidationException for
misconfiguration/endpoint issues (EndpointConnectionError and any client error
that indicates invalid configuration). Update the three handlers around the
logger.error calls (the except blocks catching NoCredentialsError, ClientError,
and EndpointConnectionError) to import and use AuthenticationException and
ValidationException, preserve the original chained exceptions using "from exc",
and keep the detailed error messages currently passed to RuntimeError.
- Around line 42-47: Wrap the AWS session/client creation in a try/except that
catches botocore.exceptions.ProfileNotFound and logs/raises a clear actionable
message mentioning DB_AWS_PROFILE; specifically, surround the boto3.Session(...)
and session.client("rds", ...) calls (where session.client and
client.generate_db_auth_token are used) with error handling that imports
ProfileNotFound from botocore.exceptions, and on that exception emit a
descriptive error (or raise a custom/ValueError) indicating the provided AWS
profile name is invalid and suggest verifying DB_AWS_PROFILE.

In `@src/config.py`:
- Around line 180-195: The _require_iam_fields validator in DBSettings currently
checks RDS_PORT with "is None", which misses invalid ints (e.g. 0 or >65535);
update the check in _require_iam_fields to validate the numeric range of
RDS_PORT (ensure 1 <= self.RDS_PORT <= 65535) and treat out-of-range values as
missing by appending "DB_RDS_PORT" to missing, so invalid port numbers fail the
same validation path as a missing port.

---

Nitpick comments:
In `@mcp/src/tools/aws-status.pbt.test.ts`:
- Around line 60-87: The property-based test currently builds a full object with
arbHealthResponse (using arbAuthMethod, arbHostname, arbPort, arbRegion,
arbStatus) so the code-path that expects keys to be omitted or null is never
exercised; change arbHealthResponse to generate partial/optional keys (e.g., use
fc.partial({...}) so fields can be omitted, or wrap individual fields with
fc.option(...) to produce null/undefined) so the test will sometimes omit
rds_hostname/rds_port/aws_region/auth_method and exercise the "or null if not
provided" branches.

In `@src/db.py`:
- Around line 152-158: Extract the duplicated DB name parsing into a helper
called _extract_db_name(connection_uri: str) that implements the existing logic
(use connection_uri.rsplit("/", 1)[-1] if "/" in connection_uri else "postgres",
then split on "?" and fallback to "postgres"); replace both places that compute
db_name from settings.DB.CONNECTION_URI with a call to
_extract_db_name(settings.DB.CONNECTION_URI) so the parsing logic is centralized
and easy to reuse (update any references to db_name and remove the duplicated
parsing snippets).
- Around line 139-147: Change the bare except in the Alembic IAM auth token
generation to catch RuntimeError instead of Exception to avoid swallowing
unrelated interrupts; specifically, update the except block around
generate_rds_auth_token (the block that currently calls logger.error with
settings.DB.AWS_REGION, settings.DB.RDS_HOSTNAME, settings.DB.RDS_USERNAME) to
"except RuntimeError:" (log the same contextual message via logger.error and
then re-raise) so AWS-related failures are handled explicitly.

In `@tests/aws_auth/test_aws_rds_pbt.py`:
- Around line 235-250: The test test_no_credentials_error_is_descriptive
declares an unused property-based input by accepting data=st.data() via the
`@given` decorator; remove the unused st.data() parameter and the corresponding
"data" argument from the test signature, and either drop the `@given` decorator
entirely or replace it with `@settings`(max_examples=100) only so the test uses
the patched boto3.Session and MagicMock without expecting Hypothesis input
(refer to test_no_credentials_error_is_descriptive, st.data(), and the `@given`
decorator).
- Around line 614-617: The EndpointConnectionError instantiation uses an
unnecessary f-string for a plain literal; update the test so
mock_client.generate_db_auth_token.side_effect =
EndpointConnectionError(endpoint_url="https://rds.us-east-1.amazonaws.com")
(remove the leading f) to avoid the redundant f-string while keeping the same
endpoint_url value referenced in the mock_client.generate_db_auth_token setup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4612dbe0-ffa3-4b8f-97a6-6397378e6dea

📥 Commits

Reviewing files that changed from the base of the PR and between 58f9abb and 48afe22.

📒 Files selected for processing (15)
  • .env.template
  • Dockerfile
  • config.toml.example
  • docker-compose.yml.example
  • mcp/src/server.ts
  • mcp/src/tools/aws-status.pbt.test.ts
  • mcp/src/tools/aws-status.test.ts
  • mcp/src/tools/aws-status.ts
  • pyproject.toml
  • src/aws_auth.py
  • src/config.py
  • src/db.py
  • tests/aws_auth/__init__.py
  • tests/aws_auth/test_aws_rds_auth.py
  • tests/aws_auth/test_aws_rds_pbt.py

Comment thread docker-compose.yml.example Outdated
Comment on lines 32 to 43
# -- AWS RDS IAM authentication (uncomment to enable) --
# environment:
# - DB_AUTH_METHOD=iam
# - DB_AWS_REGION=us-east-1
# - DB_RDS_HOSTNAME=your-rds-instance.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
# - DB_RDS_PORT=5432
# - DB_RDS_USERNAME=iam_db_user
# - DB_AWS_PROFILE= # optional: named AWS credentials profile
# - DB_RDS_SSL_CA_BUNDLE=/usr/local/share/aws/global-bundle.pem
# - CACHE_URL=redis://redis:6379/0?suppress=true
# - CACHE_ENABLED=true
env_file:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid documenting a second environment key per service.

Uncommenting these blocks as written can leave duplicate environment keys in a service, which is error-prone. Prefer documenting IAM variables inside the existing environment block (or explicitly instruct to replace the whole block).

Also applies to: 66-76

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml.example` around lines 32 - 43, The commented AWS RDS IAM
env vars create a second environment key if uncommented; update the
docker-compose.yml.example to avoid duplicate keys by moving those commented
DB_* and CACHE_* entries into the existing service "environment" block (or
change the comment to instruct users to replace the entire environment block
rather than uncommenting), and clarify around the existing env_file/environment
keys so users only enable variables within the single environment section
(reference the env_file line and the existing environment block in the service).

Comment thread Dockerfile
Comment on lines +41 to +44
# Download AWS RDS CA certificate bundle for SSL connections to RDS
RUN mkdir -p /usr/local/share/aws && \
python -c "import urllib.request; urllib.request.urlretrieve('https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem', '/usr/local/share/aws/global-bundle.pem')"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden CA bundle download (integrity + retry).

The current build step fetches trust material without integrity verification, which weakens supply-chain guarantees and reproducibility.

🔒 Suggested hardening
+# Optional: pin expected digest via build arg and rotate intentionally.
+ARG RDS_CA_BUNDLE_SHA256
 # Download AWS RDS CA certificate bundle for SSL connections to RDS
-RUN mkdir -p /usr/local/share/aws && \
-    python -c "import urllib.request; urllib.request.urlretrieve('https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem', '/usr/local/share/aws/global-bundle.pem')"
+RUN mkdir -p /usr/local/share/aws && \
+    python - <<'PY'
+import hashlib
+import urllib.request
+from pathlib import Path
+
+url = "https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem"
+dst = Path("/usr/local/share/aws/global-bundle.pem")
+urllib.request.urlretrieve(url, dst)
+if not dst.exists() or dst.stat().st_size == 0:
+    raise RuntimeError("Failed to download AWS RDS CA bundle")
+PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 41 - 44, The Dockerfile step that downloads
global-bundle.pem currently fetches it without integrity checks or retries;
update the RUN that creates /usr/local/share/aws and downloads global-bundle.pem
to (1) accept or set a known SHA256 checksum (via a build ARG or hardcoded
variable), (2) use a downloader with retry/timeout flags (curl or wget with
--retry/--tries, --fail, and reasonable timeouts) and (3) immediately verify the
file with sha256sum (or shasum) and fail the build if the checksum does not
match, so the image build aborts on mismatch; reference the existing RUN that
writes /usr/local/share/aws/global-bundle.pem and ensure the command returns
non-zero on download or verification failure.

Comment thread mcp/src/tools/aws-status.test.ts Outdated
Comment on lines +71 to +102
it("returns healthy response with all required fields", async () => {
const { server, getHandler } = createMockServer();
const ctx = createMockContext({ baseUrl: "https://api.honcho.dev" });
register(server as never, ctx);

const healthPayload = {
status: "ok",
auth_method: "iam",
rds_hostname: "mydb.rds.amazonaws.com",
rds_port: 5432,
aws_region: "us-east-1",
};

vi.stubGlobal(
"fetch",
vi.fn().mockResolvedValue({
ok: true,
json: () => Promise.resolve(healthPayload),
}),
);

const result = await getHandler()();
expect(result.isError).toBeUndefined();

const parsed = JSON.parse(result.content[0].text);
expect(parsed.auth_method).toBe("iam");
expect(parsed.rds_hostname).toBe("mydb.rds.amazonaws.com");
expect(parsed.rds_port).toBe(5432);
expect(parsed.aws_region).toBe("us-east-1");
expect(parsed.connection_healthy).toBe(true);
expect(parsed.error).toBeNull();
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tests currently validate a non-existent /health response shape.

Line 76 onward mocks auth_method, rds_hostname, rds_port, and aws_region as if /health returns them, then asserts them as expected output. Given current backend /health only returns status, this suite can pass while production behavior is incomplete. Please align mocks/assertions with the real endpoint contract (or update endpoint/tool together).

Also applies to: 139-163, 165-188

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcp/src/tools/aws-status.test.ts` around lines 71 - 102, The test is mocking
and asserting fields (auth_method, rds_hostname, rds_port, aws_region) that the
real /health endpoint does not return; update the test in aws-status.test.ts to
reflect the actual /health contract by changing the mocked healthPayload and
assertions to only include/expect the real fields (e.g., status) or else adjust
the register/getHandler logic to transform /health into the extended shape;
specifically modify the vi.stubGlobal mockResolvedValue in the "returns healthy
response with all required fields" case (and the other affected cases around
lines 139-163 and 165-188) and the subsequent parsed assertions so they either
assert only on status (and connection_healthy/error as produced by your handler)
or mock the handler's upstream response shape that register/getHandler expects.

Comment thread mcp/src/tools/aws-status.ts Outdated
Comment thread src/aws_auth.py
Comment thread src/aws_auth.py
Comment on lines +54 to +100
except NoCredentialsError as exc:
logger.error(
"AWS credentials not found for RDS IAM auth "
"(region=%s, hostname=%s, username=%s): %s",
region,
hostname,
username,
exc,
)
raise RuntimeError(
"AWS credentials not found. Ensure an IAM role is attached, "
"AWS environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) "
"are set, or DB_AWS_PROFILE is configured with a valid profile."
) from exc
except ClientError as exc:
logger.error(
"AWS client error during RDS IAM token generation "
"(region=%s, hostname=%s, username=%s): %s",
region,
hostname,
username,
exc,
)
error_code = exc.response.get("Error", {}).get("Code", "")
if "AccessDenied" in error_code or "Forbidden" in error_code:
raise RuntimeError(
"Access denied when generating RDS IAM auth token. "
"Ensure the IAM policy grants the 'rds-db:connect' action "
f"for the database user '{username}' on the target RDS resource."
) from exc
raise RuntimeError(
f"AWS client error during RDS IAM token generation: {exc}"
) from exc
except EndpointConnectionError as exc:
logger.error(
"Cannot reach AWS endpoint for RDS IAM token generation "
"(region=%s, hostname=%s, username=%s): %s",
region,
hostname,
username,
exc,
)
raise RuntimeError(
f"Cannot connect to AWS STS/RDS endpoint in region '{region}'. "
"Check network connectivity, VPC configuration, and ensure the "
"region is correct."
) from exc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify available project exception types and current usage patterns.
fd -i exceptions.py src
sed -n '1,240p' src/exceptions.py
rg -n "raise .*Exception|class .*Exception" src -g '*.py'
rg -n "raise RuntimeError" src/aws_auth.py src/db.py

Repository: plastic-labs/honcho

Length of output: 12017


Use project-specific exceptions instead of generic RuntimeError.

Raise AuthenticationException for credential/access failures and ValidationException for misconfiguration issues to maintain typed semantics and align with project conventions (see src/exceptions.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aws_auth.py` around lines 54 - 100, Replace the generic RuntimeError
raises in the exception handlers for NoCredentialsError, ClientError, and
EndpointConnectionError with the project-specific exceptions from
src/exceptions.py: raise AuthenticationException for credential/access failures
(NoCredentialsError and AccessDenied/Forbidden ClientError paths) and raise
ValidationException for misconfiguration/endpoint issues
(EndpointConnectionError and any client error that indicates invalid
configuration). Update the three handlers around the logger.error calls (the
except blocks catching NoCredentialsError, ClientError, and
EndpointConnectionError) to import and use AuthenticationException and
ValidationException, preserve the original chained exceptions using "from exc",
and keep the detailed error messages currently passed to RuntimeError.

Comment thread src/config.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcp/src/tools/aws-status.ts`:
- Around line 29-34: The fetch to healthUrl using ctx.config.baseUrl and
ctx.config.apiKey must be protected with an AbortController and timeout so the
tool cannot hang; update the code around the fetch call to create an
AbortController, pass controller.signal to fetch(healthUrl, { headers:...,
signal }), start a timer (e.g., const timeoutId = setTimeout(() =>
controller.abort(), <timeoutMs>)) before the fetch, clearTimeout(timeoutId)
after the response is received (or in a finally block), and ensure you handle
the abort error path (inspect the thrown error or error.name === 'AbortError')
so the function fails fast on timeout instead of hanging.
- Around line 44-48: The code builds an AwsRdsStatusResult by asserting fields
on an unknown JSON object (variable data) which can produce invalid runtime
values; change each assertion to perform runtime type checks instead (e.g., use
typeof checks or helper predicates isString/isNumber) and only assign the value
when the check passes, otherwise set null; specifically update the mapping for
auth_method, rds_hostname, aws_region to check typeof === 'string' and for
rds_port to check typeof === 'number' (or parse/validate an integer if strings
are allowed), or extract these checks into a small validator function used by
the code that constructs AwsRdsStatusResult so the output conforms to the
expected runtime types.

In `@src/db.py`:
- Around line 24-27: _extract_db_name() and the IAM URL-rebuild logic
reconstruct the connection URI by naive string splits and drops the original
path and query; instead, parse the provided DB_CONNECTION_URI with
urllib.parse.urlsplit/urlunsplit (or sqlalchemy.engine.url.make_url) to extract
components, preserve the original path and query, and only replace/overlay the
username, password (IAM token), and scheme/port if needed for IAM branches;
update the IAM-rewriting code blocks that currently call _extract_db_name() and
build new URIs (the IAM branch around the earlier helper and the later IAM
branch in the same module) to resume the original parsed path/query (including
empty path) and attach the new credentials/token while leaving existing query
params intact.

In `@tests/aws_auth/test_aws_rds_pbt.py`:
- Around line 133-153: The test is re-implementing src.db behavior instead of
exercising it; update the property tests (e.g., the one using
generate_rds_auth_token) to patch src.config.settings values, reload/import
src.db, and assert against src.db's exported artifacts (connection_uri,
connect_args, do_connect behavior and init_db/Alembic URL) rather than calling
generate_rds_auth_token or rebuilding URI logic in the test; specifically,
replace direct calls to generate_rds_auth_token and manual token assembly with
patches to src.config.settings, reload src.db (or importlib.reload), spy/patch
the AWS client used inside src.db (the same client used by
generate_rds_auth_token) to return distinct tokens, then assert
src.db.connection_uri, src.db.connect_args, that src.db.do_connect injects a
fresh token per connection, and that init_db() yields the correct Alembic URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a2167c07-5f06-4bbd-80f4-81e38b216a5f

📥 Commits

Reviewing files that changed from the base of the PR and between 48afe22 and b24aca7.

⛔ Files ignored due to path filters (2)
  • mcp/package-lock.json is excluded by !**/package-lock.json
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • .gitignore
  • Dockerfile
  • docker-compose.yml.example
  • mcp/package.json
  • mcp/src/tools/aws-status.ts
  • src/aws_auth.py
  • src/config.py
  • src/db.py
  • tests/aws_auth/test_aws_rds_pbt.py
✅ Files skipped from review due to trivial changes (3)
  • .gitignore
  • mcp/package.json
  • docker-compose.yml.example
🚧 Files skipped from review as they are similar to previous changes (2)
  • Dockerfile
  • src/aws_auth.py

Comment thread mcp/src/tools/aws-status.ts Outdated
Comment thread mcp/src/tools/aws-status.ts Outdated
Comment thread src/db.py Outdated
Comment thread tests/aws_auth/test_aws_rds_pbt.py Outdated
Comment on lines +133 to +153
def test_each_connection_gets_unique_token(self, mock_session_cls, num_connections):
# Feature: aws-mcp-postgres, Property 2: Fresh token injection on every connection
tokens = [f"token-{i}" for i in range(num_connections)]
mock_client = MagicMock()
mock_client.generate_db_auth_token.side_effect = tokens
mock_session_cls.return_value.client.return_value = mock_client

collected_tokens = []
for _ in range(num_connections):
t = generate_rds_auth_token(
region="us-east-1",
hostname="db.rds.amazonaws.com",
port=5432,
username="user",
)
collected_tokens.append(t)

# Each call should have produced a distinct token
assert len(collected_tokens) == num_connections
assert len(set(collected_tokens)) == num_connections
assert mock_client.generate_db_auth_token.call_count == num_connections
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These properties are asserting copied logic, not src.db behavior.

Properties 2/5/10/11/13/14/15 rebuild the src.db conditionals and URI assembly inside the test body, and Property 15 never calls init_db() at all. That means a regression in the real module can still leave this file green. Please exercise src.db directly here—e.g. patch src.config.settings, reload src.db, and assert the actual connection_uri, connect_args, do_connect token injection, and init_db() Alembic URL behavior instead of re-implementing them in the tests.

Also applies to: 287-298, 400-405, 434-440, 516-529, 560-582, 603-627

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/aws_auth/test_aws_rds_pbt.py` around lines 133 - 153, The test is
re-implementing src.db behavior instead of exercising it; update the property
tests (e.g., the one using generate_rds_auth_token) to patch src.config.settings
values, reload/import src.db, and assert against src.db's exported artifacts
(connection_uri, connect_args, do_connect behavior and init_db/Alembic URL)
rather than calling generate_rds_auth_token or rebuilding URI logic in the test;
specifically, replace direct calls to generate_rds_auth_token and manual token
assembly with patches to src.config.settings, reload src.db (or
importlib.reload), spy/patch the AWS client used inside src.db (the same client
used by generate_rds_auth_token) to return distinct tokens, then assert
src.db.connection_uri, src.db.connect_args, that src.db.do_connect injects a
fresh token per connection, and that init_db() yields the correct Alembic URL.

@ajspig
Copy link
Copy Markdown
Contributor

ajspig commented Apr 14, 2026

▎ Hey @fossouo — thanks for tackling this! IAM-based RDS auth is a useful addition for production deployments.

A couple things before this can move forward:

  1. Scope: The core feature here is IAM token-based DB connections, but this PR includes a lot of extras — MCP tools (aws-status.ts), property-based tests, monitoring tooling — that push it to 6,400+ lines. Could you trim this down to just the IAM auth plumbing (src/db.py, src/config.py, .env.template, and focused unit tests)? The MCP tools and status monitoring can be a separate PR if there's interest.

  2. Conflicts: Refactor clients.py to add modern features and more flexible configuration #459 is an in-flight config refactor that touches config.py, .env.template, and pyproject.toml. You'll want to wait for that to land and rebase on top of it before updating this PR.

Happy to review once it's scoped down and rebased!

@fossouo
Copy link
Copy Markdown
Author

fossouo commented Apr 15, 2026

▎ Hey @fossouo — thanks for tackling this! IAM-based RDS auth is a useful addition for production deployments.

A couple things before this can move forward:

  1. Scope: The core feature here is IAM token-based DB connections, but this PR includes a lot of extras — MCP tools (aws-status.ts), property-based tests, monitoring tooling — that push it to 6,400+ lines. Could you trim this down to just the IAM auth plumbing (src/db.py, src/config.py, .env.template, and focused unit tests)? The MCP tools and status monitoring can be a separate PR if there's interest.
  2. Conflicts: Refactor clients.py to add modern features and more flexible configuration #459 is an in-flight config refactor that touches config.py, .env.template, and pyproject.toml. You'll want to wait for that to land and rebase on top of it before updating this PR.

Happy to review once it's scoped down and rebased!

Thanks for the feedback! I'll trim this down to just the core IAM auth plumbing (config, aws_auth, db engine, Dockerfile, and focused unit tests) and move the MCP tools + property-based tests to a follow-up PR.

I see #459 is still in flight — I'll wait for it to land and rebase on top before updating. Will ping here once it's ready for another look.

dfosso added 5 commits April 28, 2026 16:22
Add support for connecting to AWS RDS PostgreSQL using IAM authentication
tokens instead of static passwords. This enables operators to leverage
AWS IAM policies for database access control.

Changes:
- Add AUTH_METHOD, AWS_REGION, RDS_HOSTNAME, RDS_PORT, RDS_USERNAME,
  AWS_PROFILE, and RDS_SSL_CA_BUNDLE settings to DBSettings
- Create aws_auth module for IAM token generation via boto3
- Update database engine with do_connect event listener for token injection
- Add SSL enforcement and pool tuning for IAM mode
- Update Alembic migrations to support IAM authentication
- Add aws_rds_status MCP tool for connectivity diagnostics
- Update Dockerfile with AWS RDS CA certificate bundle
- Add unit tests and property-based tests (hypothesis + fast-check)
- Add boto3 dependency
- Add ProfileNotFound exception handling in aws_auth.py
- Validate RDS_PORT range (1-65535) via Annotated Field
- Extract _extract_db_name helper to deduplicate DB name parsing
- Narrow except clause to RuntimeError in init_db
- Harden Dockerfile CA bundle download with existence check
- Fix docker-compose duplicate environment key issue
- Remove unused st.data() and unnecessary f-string in PBT tests
- Update MCP tool description for defensive field handling
- Add AbortController timeout (5s) to MCP health check fetch
- Use runtime typeof checks instead of TS type assertions for JSON fields
- Parse CONNECTION_URI with urlsplit/urlunsplit to preserve path and query params
- Overlay only IAM-specific pieces (username, password, host, port) on parsed URI


Remove MCP aws_rds_status tool, property-based tests, and lock file
to reduce scope per maintainer feedback. Rebased on merged plastic-labs#459.
MCP tools and PBT tests will be a follow-up PR.
@fossouo fossouo force-pushed the feature/aws-rds-iam-authentication branch from 5a3872d to 37d7d04 Compare April 28, 2026 14:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/aws_auth/test_aws_rds_auth.py (2)

242-270: Test simulates logic rather than testing actual implementation.

test_iam_mode_engine_setup duplicates the URI construction logic from db.py rather than importing and testing the actual module. While this avoids module re-import complexity, it means the test won't catch regressions if db.py's logic changes.

Consider extracting the URI construction into a helper function in db.py that can be unit-tested directly, or document this as an integration concern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/aws_auth/test_aws_rds_auth.py` around lines 242 - 270, The test
test_iam_mode_engine_setup duplicates the URI construction logic instead of
exercising the real implementation; refactor by moving the URI construction into
a small, testable helper in db.py (e.g., a function build_rds_uri(db_settings:
DBSettings) or get_iam_rds_uri) and update the test to import and call that
helper (or alternatively change the test to import db.py and call the existing
function that performs the same work). Ensure the helper accepts the
DBSettings-like input and returns the URI string used by the engine, then
replace the duplicated construction in the test with a call to that helper so
regressions in db.py are caught by the unit test.

223-240: Test assertions may not verify intended behavior.

Tests test_password_mode_no_do_connect_listener and test_password_mode_uses_connection_uri guard on os.environ.get("DB_AUTH_METHOD") != "iam" but don't actually verify the absence of the listener or the module's configuration state—they just assert the settings value. The tests pass when the environment condition holds but don't fail meaningfully if the condition is false.

Consider restructuring to use pytest.mark.skipif with a clear skip reason, or mock the settings during test setup to control the test environment deterministically.

♻️ Suggested refactor
+    `@pytest.mark.skipif`(
+        os.environ.get("DB_AUTH_METHOD") == "iam",
+        reason="Test only applicable in password mode"
+    )
     def test_password_mode_no_do_connect_listener(self):
         """In password mode the engine should NOT have a do_connect listener
         that injects IAM tokens."""
-        # The default settings use password mode. Verify that the db module
-        # did not register the IAM token injection path.
-        if os.environ.get("DB_AUTH_METHOD") != "iam":
-            from src import db as db_module
-
-            # In password mode, the module should not have imported aws_auth
-            # at module level (it's only imported inside the iam branch).
-            assert settings.DB.AUTH_METHOD == "password"
+        from src import db as db_module
+        # In password mode, the module should not have imported aws_auth
+        # at module level (it's only imported inside the iam branch).
+        assert settings.DB.AUTH_METHOD == "password"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/aws_auth/test_aws_rds_auth.py` around lines 223 - 240, The two tests
(test_password_mode_no_do_connect_listener,
test_password_mode_uses_connection_uri) currently gate behavior with a runtime
os.environ check and only assert settings.DB.AUTH_METHOD; update them to be
deterministic by either applying
pytest.mark.skipif(os.environ.get("DB_AUTH_METHOD") == "iam", reason="requires
password mode") or by mocking the environment/settings to force password mode
before importing src.db; then assert the actual behaviors: for
test_password_mode_no_do_connect_listener import src.db as db_module and verify
aws_auth was not imported at module level and that no IAM-token do_connect
listener is registered on the engine (check the engine event listeners list or
db_module.engine.dispatch), and for test_password_mode_uses_connection_uri
import connection_uri from src.db and assert connection_uri ==
settings.DB.CONNECTION_URI after forcing password mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/aws_auth/test_aws_rds_auth.py`:
- Around line 242-270: The test test_iam_mode_engine_setup duplicates the URI
construction logic instead of exercising the real implementation; refactor by
moving the URI construction into a small, testable helper in db.py (e.g., a
function build_rds_uri(db_settings: DBSettings) or get_iam_rds_uri) and update
the test to import and call that helper (or alternatively change the test to
import db.py and call the existing function that performs the same work). Ensure
the helper accepts the DBSettings-like input and returns the URI string used by
the engine, then replace the duplicated construction in the test with a call to
that helper so regressions in db.py are caught by the unit test.
- Around line 223-240: The two tests (test_password_mode_no_do_connect_listener,
test_password_mode_uses_connection_uri) currently gate behavior with a runtime
os.environ check and only assert settings.DB.AUTH_METHOD; update them to be
deterministic by either applying
pytest.mark.skipif(os.environ.get("DB_AUTH_METHOD") == "iam", reason="requires
password mode") or by mocking the environment/settings to force password mode
before importing src.db; then assert the actual behaviors: for
test_password_mode_no_do_connect_listener import src.db as db_module and verify
aws_auth was not imported at module level and that no IAM-token do_connect
listener is registered on the engine (check the engine event listeners list or
db_module.engine.dispatch), and for test_password_mode_uses_connection_uri
import connection_uri from src.db and assert connection_uri ==
settings.DB.CONNECTION_URI after forcing password mode.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 472c1193-861c-48e2-a31a-1aee9efdc844

📥 Commits

Reviewing files that changed from the base of the PR and between 5a3872d and 37d7d04.

⛔ Files ignored due to path filters (2)
  • mcp/package-lock.json is excluded by !**/package-lock.json
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .env.template
  • .gitignore
  • Dockerfile
  • config.toml.example
  • docker-compose.yml.example
  • pyproject.toml
  • src/aws_auth.py
  • src/config.py
  • src/db.py
  • tests/aws_auth/__init__.py
  • tests/aws_auth/test_aws_rds_auth.py
✅ Files skipped from review due to trivial changes (6)
  • tests/aws_auth/init.py
  • pyproject.toml
  • docker-compose.yml.example
  • .env.template
  • .gitignore
  • config.toml.example
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants