The MCP stdio Problem: Why I Rebuilt My Auth Proxy as a Persistent HTTP Service

If you run authenticated remote MCP servers with Claude Code, you've probably fought this.

Claude Code has an idle timeout that silently kills stdio MCP server processes. There's no way to configure it. When it fires, your MCP connection dies, your auth state vanishes, and you're back to manually reconnecting and re-authenticating. After weeks of this, I ripped out the stdio bridge entirely and replaced it with a persistent HTTP reverse proxy. Then I hit a second, more obscure bug that took another patch to fix.

The Setup: Why I Needed an Auth Proxy at All#

I run a self-hosted MCP server behind OAuth: a task manager, memory system, and a handful of other tools that my Claude Code sessions connect to. The server lives on the public internet with proper authentication. OAuth 2.0 Device Flow for initial login, Bearer tokens for ongoing requests, automatic refresh when tokens expire.

Claude Code connects to MCP servers in one of two ways: stdio (launch a subprocess, talk over stdin/stdout) or HTTP (connect to a URL). The stdio approach is simpler to set up but comes with a hard architectural constraint. Claude Code owns the process lifecycle.

Act 1: Death by Idle Timeout#

My first implementation was an stdio bridge. Claude Code would spawn it as a subprocess, and it would:

Load cached OAuth tokens (or run device flow if none existed)
Connect to the remote MCP server over SSE
Relay JSON-RPC messages between Claude Code's stdin/stdout and the remote server

This worked, until it didn't.

Claude Code has an internal idle timeout for stdio MCP servers. When you stop using tools for a while — you're reading code, taking a break, whatever — it kills the subprocess. Silently. No warning, no callback.

The next time you try to use an MCP tool, Claude Code spawns a fresh process. Your SSE connection is gone. Your auth state might be stale. The device flow might need to run again interactively. Claude Code doesn't know any of this; it just sees the tool call fail and reports an error.

I tried to work around it:

Keepalive pings: Send periodic pings to detect dead upstream connections before Claude Code noticed. Helped with server-side drops, but did nothing about Claude Code killing the local process.
Read timeouts: Added explicit timeouts to the httpx client for long-lived SSE connections. Fixed one class of failure, same core problem.

There's no way to configure the idle timeout. Claude Code decides when your stdio process dies, and you can't tell it otherwise.

The Fix: Decouple Process Lifecycle from Connection Lifecycle#

Stop letting Claude Code own the process. Run a persistent HTTP server that Claude Code connects to like any other URL, instead of an stdio bridge that lives and dies at its whim.

Before:  Claude Code ──stdio──▶ proxy process (killed on idle) ──HTTPS──▶ remote MCP
After:   Claude Code ──HTTP──▶ localhost:4510 (always running) ──HTTPS+Bearer──▶ remote MCP

Claude Code's HTTP MCP transport doesn't have the same lifecycle problem. It makes HTTP requests. If the server is there, it works. If Claude Code goes idle and comes back, the server is still there.

Act 2: The Rewrite#

The new proxy is a Starlette ASGI application, about 370 lines of Python. The code lives in a private monorepo and isn't open-sourced yet, but the key snippets are all below. It does three things:

1. Transparent Bearer Auth Injection

Every request from Claude Code arrives unauthenticated on 127.0.0.1:4510. The proxy strips hop-by-hop headers, injects the OAuth Bearer token, and forwards to the remote server:

def _forward_headers(request_headers: Headers, token: str) -> dict[str, str]:
    headers: dict[str, str] = {}
    for key, value in request_headers.items():
        if key.lower() not in _STRIP_REQUEST_HEADERS:
            headers[key] = value
    headers["Authorization"] = f"Bearer {token}"
    return headers

If the remote returns 401 or 403, the proxy automatically refreshes the token and retries once. No manual intervention:

if remote_resp.status_code in (401, 403):
    await remote_resp.aclose()
    token = await token_manager.force_refresh()
    headers["Authorization"] = f"Bearer {token}"
    remote_resp = await http_client.send(...)

2. Streaming Pass-Through

MCP uses streaming responses. The proxy forwards the raw byte stream without buffering, with handling for mid-stream disconnections:

async def stream_body() -> AsyncGenerator[bytes]:
    try:
        async for chunk in remote_resp.aiter_raw():
            yield chunk
    except httpx.RemoteProtocolError as exc:
        logger.warning("Remote closed connection mid-stream: %s", exc)
    finally:
        await remote_resp.aclose()

That RemoteProtocolError catch wasn't in the first version. It came from a follow-up patch after I saw incomplete chunked reads crash the ASGI handler in production. The remote server occasionally drops connections mid-stream (network blip, server restart), and the proxy needs to handle that instead of propagating an unhandled exception.

3. Token Lifecycle Management

The TokenManager class handles the full OAuth lifecycle: loading cached tokens from ~/.mcp-proxy/tokens/, refreshing expired tokens with a 60-second buffer, and falling back to interactive device flow when no valid token exists.

Concurrent token refresh is serialized with an asyncio.Lock to prevent duplicate OAuth requests when multiple MCP calls arrive simultaneously with an expired token:

async def force_refresh(self) -> str:
    async with self._refresh_lock:
        self.current_token = None
        self.storage.delete_token(self.server_url)
        return await self.ensure_valid_token()

systemd for Always-On Operation

The proxy runs as a systemd user service. Starts on login, restarts on crash:

[Unit]
Description=MCP Auth Proxy (local HTTP to remote HTTPS)
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=120
StartLimitBurst=5

[Service]
Type=simple
ExecStart=/usr/bin/uv run --project %h/build/infra python %h/build/infra/scripts/mcp-auth-proxy.py
Restart=always
RestartSec=5

[Install]
WantedBy=default.target

Setup is three commands:

# One-time: authenticate and cache token
uv run python scripts/mcp-auth-proxy.py --auth-only

# Install and start the service
cp scripts/mcp-auth-proxy.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now mcp-auth-proxy

Then point Claude Code at http://127.0.0.1:4510/mcp and forget about it.

Act 3: The Well-Known Endpoint Crash#

The rewrite worked immediately. No more idle disconnections, no more re-authentication. I shipped it and moved on.

Twenty minutes later, Claude Code stopped connecting.

The logs showed Claude Code making requests to two endpoints I hadn't defined:

GET /.well-known/oauth-authorization-server
GET /.well-known/oauth-protected-resource

These are RFC 8414 and RFC 9728 OAuth discovery endpoints. When Claude Code's MCP SDK connects to an HTTP MCP server, it probes these to check whether the server requires OAuth before sending any MCP requests.

The problem: Starlette returns a plain-text Not Found for undefined routes. The MCP SDK expects JSON from these endpoints, either a valid discovery document or a JSON error. Plain text makes the JSON parser throw, and the entire connection fails.

The fix was two routes that return empty JSON with a 404 status:

async def _well_known_not_found(request: Request) -> Response:
    return Response(content="{}", status_code=404, media_type="application/json")

This tells the SDK "no OAuth here" in a format it can parse. Four lines of actual logic.

The failure mode was opaque, though. No useful error from Claude Code, no indication the SDK was probing well-known endpoints at all. I only found it by adding --debug logging to the proxy and watching the request log. If you're building an authenticated MCP server and Claude Code silently refuses to connect over HTTP, check your well-known endpoint responses first.

What Ended Up Shipping#

stdio → HTTP reverse proxy + systemd. Decouples from Claude Code's process lifecycle entirely.
JSON 404 for OAuth well-known endpoints. The MCP SDK crashes on plain-text 404s.
Streaming resilience + timeouts. Handles mid-stream disconnections instead of crashing the ASGI handler.

It's just a URL.

This post is a companion to Building Secure Agentic Systems: The Six Layers. That post covers what to secure across the agent stack; this one covers the transport plumbing underneath — the MCP connectivity that has to work before any of those security layers come into play.