You built an MCP server. It passes compliance tests. It works in Claude Desktop with a local stdio connection. You deploy it as a remote server with OAuth for authentication. And then it breaks in production in ways you did not anticipate.
OAuth-based authentication is the number one “works in demo, breaks in production” failure mode for remote MCP servers. The MCP ecosystem has grown to over 5,800 servers, and as organizations move from local development to production deployments, auth is consistently where things fall apart.
Here is what goes wrong and how to fix it.
The auth landscape for MCP servers
MCP servers have three common authentication patterns:
- No auth - local stdio servers that only accept connections from the host machine. This is how most development happens.
- API key auth - a static bearer token in the request header. Simple, stateless, and well understood.
- OAuth 2.0 - the standard for production multi-tenant servers. Users authenticate through an identity provider, get tokens, and include them in MCP requests.
Option 1 does not work remotely. Option 2 works but does not scale to multi-tenant or user-facing deployments. Option 3 is the right answer for production - and the source of most production incidents.
Problem 1: Token refresh during long-running sessions
OAuth access tokens expire. Typically after 15 minutes to 1 hour. MCP sessions, by contrast, can last for hours. An AI agent working through a complex task might maintain a single MCP session for the entire duration.
When the access token expires mid-session, the MCP client needs to refresh it without interrupting the session. Most MCP client implementations do not handle this gracefully. The common failure mode:
- Client establishes MCP session with a valid access token
- Token expires 30 minutes later
- Next tool call returns a 401
- Client does not have a refresh token, or the refresh fails
- Session is lost - all accumulated context is gone
- User has to re-authenticate and start over
The fix: your MCP server should accept refresh tokens alongside access tokens and handle the refresh transparently. If you are building the server, implement token refresh as middleware that intercepts 401s, refreshes the token, and retries the request before surfacing the error to the MCP protocol layer.
// Middleware pattern for transparent token refresh
async function withTokenRefresh(req, handler) {
try {
return await handler(req);
} catch (err) {
if (err.status === 401 && req.refreshToken) {
const newToken = await refreshAccessToken(req.refreshToken);
req.headers.authorization = `Bearer ${newToken}`;
return await handler(req);
}
throw err;
}
}Problem 2: Session persistence across reconnects
Network connections drop. Clients close and reopen. Laptops go to sleep. When the MCP client reconnects, should it get the same session state or start fresh?
The answer depends on whether your server is stateful. If it maintains conversation context, tool state, or cached data per session, losing that state on reconnect degrades the user experience significantly.
The MCP spec’s session ID mechanism helps here, but the implementation details matter. Our approach at the proxy layer: we correlate requests into sessions using a combination of the MCP session ID and a server-issued correlation token. When a client reconnects with a known session ID, the proxy routes to the same backend instance where the session state lives.
For servers behind a load balancer, this requires session affinity. We use Redis-compatible session routing (Valkey) to map session IDs to backend instances. Without this, reconnecting clients land on random backends and lose their state.
Problem 3: Multi-tenant isolation
A single MCP server often serves multiple users or organizations. Each tenant needs isolated data access. An OAuth token from Tenant A should never return data belonging to Tenant B.
This sounds obvious, but the failure modes are subtle:
- Shared caches. If your server caches tool results without scoping the cache key to the tenant, one tenant’s cached response leaks to another.
- Shared connection pools. Database connections authenticated as a service account can access all tenants’ data. Each request needs its own authorization context, enforced at the query layer, not just the API layer.
- Logging and analytics. Tool call logs that include request/response bodies can leak tenant data if the logging pipeline is not tenant-scoped.
The fix: extract the tenant identifier from the OAuth token (typically an org_id or tenant_id claim) at the earliest possible point and thread it through every layer - caching, database queries, logging, and analytics.
Problem 4: API key rotation without downtime
Teams that start with API key auth eventually need to rotate those keys. A key rotation that invalidates the old key immediately breaks every client that has not yet updated. In MCP deployments where clients may be running Claude Code sessions for hours, immediate invalidation means disrupting active work.
The pattern that works: support two active keys simultaneously during a rotation window. Issue a new key, distribute it, then deactivate the old one after a grace period.
# API key rotation with grace period
# 1. Generate a new key (old key still works)
curl -X POST https://mcp.hosting/api/keys \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"name": "production-v2", "expires_in": "30d"}'
# 2. Update clients to use the new key
# 3. After all clients are updated, revoke the old key
curl -X DELETE https://mcp.hosting/api/keys/production-v1 \
-H "Authorization: Bearer $ADMIN_TOKEN"On mcp.hosting, each server can have multiple active API keys with independent expiration dates. You create the new key, deploy the update, and revoke the old key when you are confident nothing still uses it.
Problem 5: Rate limiting per tenant
Without per-tenant rate limiting, a single aggressive client can consume all your server’s capacity. AI agents are particularly prone to this - an agent in a retry loop can generate hundreds of requests per second.
Rate limiting for MCP servers needs to happen at multiple levels:
- Per API key - limits total requests from a single key
- Per session - limits requests within a single MCP session to prevent runaway agents
- Per tool - limits calls to expensive tools independently from cheap ones
The proxy layer is the natural place to enforce limits because it sees all traffic before it reaches the backend, regardless of which backend instance handles the request.
The proxy approach
If you are deploying MCP servers to production and hitting these auth problems, there are two paths:
- Build it yourself. Implement token refresh, session affinity, tenant isolation, key rotation, and rate limiting in your MCP server code. This is significant engineering work, and every MCP server you deploy needs the same infrastructure.
- Use a proxy. Put your MCP server behind a proxy that handles auth, sessions, rate limiting, and analytics at the infrastructure layer. Your server code handles business logic; the proxy handles production concerns.
Whether you run that proxy yourself or reach for an off-the-shelf API gateway, the shape is the same: API key management, session routing, rate limiting, and analytics live at the edge; your MCP server receives authenticated, rate-limited requests with tenant context already extracted. Before rolling anything to production, run mcp-compliance to catch spec violations the proxy will happily pass through.
Checklist for production MCP auth
Before shipping a remote MCP server:
- Token refresh - does your server handle expired tokens without dropping the session?
- Session persistence - do reconnecting clients get their session state back?
- Tenant isolation - are caches, connections, and logs scoped to the authenticated tenant?
- Key rotation - can you rotate API keys without downtime?
- Rate limiting - is each tenant limited independently?
- Audit logging - can you trace every tool call back to the authenticated user?
- SSRF protection - if your tools make outbound requests, are they validated against internal network ranges?
Auth is the gap between “works locally” and “works in production.” Close it before your users find it.
Jeff Yaw, Yaw Labs. Follow along at tokenlimit.news for weekly notes on AI infrastructure.