Skip to content

MCP watch mode needs resource guardrails and safer defaults on macOS #628

@AriaShishegaran

Description

@AriaShishegaran

Summary

CodeGraph currently requires users to discover OS-level file descriptor / watcher exhaustion and manually mitigate it by killing daemons or adding --no-watch. That is too fragile for normal agent workflows where a developer may have many MCP clients and many project roots open at once.

This is a guardrail/defaults issue related to, but distinct from, the raw FD leak reports in #496 / #555 and the stale-process accumulation report in #579.

Environment where observed

  • macOS 26.5 / Darwin 25.5.0, Apple Silicon arm64
  • CodeGraph 0.9.8 plus older 0.9.7 daemons from prior sessions
  • kern.maxfiles: 491520
  • kern.maxfilesperproc: 245760
  • Several MCP-capable agent clients used across multiple project roots

What happened

Even after moving away from a huge root-level workspace index to per-project indexes, CodeGraph still created enough watcher / open-file pressure that daemon logs showed system-level ENFILE: file table overflow errors.

Sanitized live snapshot:

before cleanup:
  kern.num_files: 15771
  kern.maxfiles: 491520

project daemon FD counts:
  codegraph serve --mcp --path <large-go-project>:        2901 numeric FDs
  codegraph serve --mcp --path <medium-flutter-project>:   599 numeric FDs
  codegraph serve --mcp --path <small-swift-project>:       96 numeric FDs

The largest daemon was almost entirely regular-file descriptors:

REG      2887
PIPE        4
DIR         3
KQUEUE      3
CHR         2
unix        2
numeric_fds 2901

The daemon log contained repeated errors like:

[CodeGraph] File watcher error {
  error: "Error: ENFILE: file table overflow, realpath '<workspace>/internal/...'"
}
[CodeGraph] File watcher error {
  error: "Error: ENFILE: file table overflow, lstat '<workspace>/internal/.../client_test.go'"
}
[CodeGraph] File watcher error {
  error: "Error: ENFILE: file table overflow, scandir '<workspace>/internal/...'"
}

Killing stale/high-FD project daemons released pressure immediately:

after killing two stale project daemons:
  kern.num_files: 12164

after killing the remaining project watcher:
  kern.num_files: 12042

--no-watch was an effective local mitigation:

codegraph serve --mcp --no-watch

Why this needs product-level guardrails

This should not require manual OS debugging. On macOS, exhausting the global file table causes failures in unrelated processes. The user sees shells, browsers, IDEs, Docker, and background services behaving badly, not a clear CodeGraph error.

Also, powerful hardware does not make the failure mode acceptable. A machine with a high global FD limit still experienced CodeGraph-originated ENFILE logs. Developers using agents often have many repos and sessions active; CodeGraph should degrade gracefully under that workload.

Suggested behavior

I think CodeGraph should fail safe by default:

  1. Budget-aware watcher startup

    • Estimate files/directories to watch before enabling the watcher.
    • Compare against OS limits (kern.maxfiles, kern.maxfilesperproc, current kern.num_files on macOS; inotify limits on Linux).
    • If the projected watcher/indexer cost is risky, refuse watch mode or auto-fallback to --no-watch with a clear warning.
  2. FD telemetry in status/debug output

    • Add something like codegraph status --resources or codegraph debug resources.
    • Report current process FD count, watcher count if available, path root, client count, idle-timeout state, and OS budget percentage.
  3. Default MCP install should be conservative

    • For stdio MCP installs on macOS, consider installing args = ["serve", "--mcp", "--no-watch"] until watcher FD usage is bounded.
    • Or ask during codegraph install: "Enable live watcher? This can be expensive on large workspaces."
  4. Release resources when idle

    • If clients=0, close watchers and open file handles.
    • Rehydrate watcher state only when a client attaches or a query requires sync.
  5. Hard cap and warning

    • A daemon should never be allowed to hold tens of thousands of file descriptors without a loud warning.
    • A configurable cap such as CODEGRAPH_MAX_OPEN_FDS / --max-open-fds would be better than letting the OS global table fail.

Expected outcome

Users who work across many projects should be able to install CodeGraph once and not periodically debug global OS file-table exhaustion. If watch mode is too expensive for a workspace, CodeGraph should detect that, explain it, and keep the index/query path usable in no-watch/manual-sync mode.

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions