If you wrote your first MCP server in late 2024, it almost certainly read JSON from stdin and wrote JSON to stdout. That was the prototype. The 2025-06-18 specification quietly retired that mental model. The transport you are supposed to ship now is HTTP, the auth model is OAuth 2.1 with mandatory resource indicators, and the registry of working remote servers has caught up with the spec faster than most teams have caught up with the registry.

Most teams treat this as cosmetic. It is not. The shift from stdio to Streamable HTTP changes how you deploy, observe, and authenticate an MCP server. It changes which security failures are possible. It also changes whether your server can be used at all by hosted clients that no longer want to spawn subprocesses on behalf of users.

Why stdio worked, and why the spec moved on

The original MCP transport choice was pragmatic. A desktop client like Claude Desktop or an IDE plugin could ship a single binary, run npx some-mcp-server, and pipe newline-delimited JSON-RPC over stdin and stdout. No network, no auth, no ports. The host process owned the lifetime of the server and crashed it on exit. For a protocol whose first goal was to standardize how local tools talk to a local model, stdio was the right primitive.

That model broke as soon as the clients went remote. A hosted assistant cannot fork a subprocess on someone else's laptop. A team that wants to expose a shared internal tool, say a Jira bridge or an observability query interface, does not want every employee to install and update a binary. The natural shape is a service. The early answer was the HTTP+SSE transport defined in protocol version 2024-11-05, which paired a POST endpoint for client-to-server messages with a long-lived SSE channel for server-to-client messages.

It mostly worked. The problem was that it required two endpoints, used a separate endpoint SSE event to advertise where to POST, and made resumption awkward. Load balancers had to keep affinity across two distinct paths. The 2025-03-26 spec replaced it with a single endpoint that speaks both directions: Streamable HTTP. The 2025-06-18 revision is the version most production servers are now targeting.

Streamable HTTP, in plain terms

A Streamable HTTP server exposes one URL, conventionally something like https://mcp.example.com/mcp, and accepts both POST and GET on it.

POST is how the client sends requests, notifications, and responses. The client must include Accept: application/json, text/event-stream on every POST, because the server gets to choose, per request, whether to reply with a single JSON document or open an SSE stream for that request. A short tool call comes back as a JSON body. A long-running call that wants to emit progress notifications upgrades to SSE for the duration of that one request and closes the stream when the response is sent.

GET is how the client invites the server to push messages it could not otherwise deliver. The client opens an SSE stream with a bare GET, and the server may send unsolicited notifications or requests on it. The server is allowed to return 405 if it has nothing server-initiated to say. That is a real choice, not an oversight. A purely request-response tool server has no reason to keep a persistent stream open per client.

Two HTTP headers carry the protocol state. Mcp-Session-Id is assigned by the server in the response to initialize, and the client echoes it on every subsequent request. MCP-Protocol-Version carries the negotiated version on every request after init. Both are mandatory in practice. A request without Mcp-Session-Id after init can be answered with 400. A request with a stale session can be answered with 404, at which point the client is expected to reinitialize.

Resumability is the part most server authors get wrong on the first try. The SSE specification already defines an id: field per event and a Last-Event-ID request header for resumption. MCP reuses those primitives, with one constraint: event IDs are scoped per stream. If a client reconnects with Last-Event-ID, the server replays only the messages that would have gone out on that exact stream, not on any stream in the session. Servers that store a single global cursor and replay everything will deliver duplicates and drop ordering guarantees on parallel streams.

 

Technical schematic showing two parallel horizontal SSE streams on grid.

The local-server footgun

Streamable HTTP is also the recommended transport for servers that happen to run on the same machine as the client. That is where the spec turns sharp. The security section is one short paragraph that says: validate the Origin header, bind to 127.0.0.1 instead of 0.0.0.0, and require authentication. The reason is DNS rebinding.

A locally bound MCP server on, say, port 7331 is reachable from any web page the user opens. If that server does not check Origin, a malicious site can rebind a domain it controls to 127.0.0.1 and start making requests with the user's cookies and full access to whatever tools the server exposes. Anyone who shipped a developer-tools server in 2024 has either fixed this or is about to read a Hacker News thread they will not enjoy. The fix is short: reject any request whose Origin is not on an allowlist, and bind the listener to loopback only. The spec calls these out because too many servers shipped without either.

The OAuth story, finally written down

Authentication was the part of MCP that everyone agreed needed fixing and nobody wanted to design. The 2025-06-18 authorization spec is the result. It is OAuth 2.1, but with a specific subset of RFCs declared mandatory, and that combination is the part worth understanding.

SpecRole in MCPLevel
OAuth 2.1 draft 13Authorization framework. PKCE required, implicit and password grants gone.MUST
RFC 9728 (Protected Resource Metadata)How an MCP server advertises which authorization servers it trusts.MUST
RFC 8414 (AS Metadata)How an authorization server advertises its endpoints and capabilities.MUST
RFC 8707 (Resource Indicators)Binds the issued token to a specific MCP server URI.MUST
RFC 7591 (Dynamic Client Registration)Lets clients register with previously-unknown authorization servers.SHOULD

Read that table once and most of the protocol design falls into place. The MCP server is an OAuth 2.1 resource server, not an authorization server. The client discovers the authorization server by hitting the MCP endpoint without a token, getting a 401 with a WWW-Authenticate header pointing at /.well-known/oauth-protected-resource, reading that metadata, and following the authorization_servers field to a separate authorization server's metadata. The mental model is the same one Microsoft Graph and AWS STS already use. MCP did not invent anything here. It picked an opinion.

The opinionated part is RFC 8707. Resource indicators are not new, but they are not widely adopted, and the MCP spec made them non-optional. Every authorization request and every token request must include a resource parameter that names the canonical URI of the MCP server. Every server must validate that the token in front of it was issued for it. That single requirement closes a category of bugs.

Why the audience requirement matters

Picture a Slack-bridge MCP server. The user authenticates, the server gets an OAuth token that grants access to the user's Slack workspace, and the server stores it. The same server is also bridged to a notes tool. A naive implementation might accept any OAuth token whose signature checks out and route it forward. That is the confused deputy: the server has been given a token, and it does not stop to ask whether the token was meant for it.

Token passthrough makes this worse. If the MCP server takes the inbound token from the client and uses it directly when calling Slack's API, two boundaries collapse. The downstream API has no way to know whether the token was vetted by the MCP server or smuggled in by a third party who got the user to install an attacker-controlled client. The 2025-06-18 spec forbids this explicitly. The MCP server must validate the audience on the inbound token, and if it needs to call an upstream API, it must use a separate token it obtained itself as an OAuth client to that upstream service.

The practical consequence: every production MCP server that bridges to a third-party API is now a small OAuth client in addition to being an OAuth resource server. It manages its own credentials with the upstream service. It does not pretend the client's token is the same as its own.

Dynamic Client Registration is the other non-obvious requirement. Without RFC 7591, every MCP client would need a manually issued client ID per authorization server it might ever see. That is fine for a handful of enterprise integrations and unworkable for the open ecosystem MCP is aimed at. With it, a new client connecting to a previously unseen MCP server can POST to /register, get a client ID, and continue. The spec acknowledges that not every authorization server will support it and falls back to hardcoded or user-provided IDs, but the direction is clear.

What the wire actually looks like

The full handshake, end to end, takes more steps than it sounds like. The minimum sequence for a remote, authenticated tool call:

  • The client POSTs the MCP endpoint without a token and gets a 401 with WWW-Authenticate: Bearer resource_metadata=....
  • The client fetches /.well-known/oauth-protected-resource on the MCP server, parses authorization_servers, and picks one.
  • The client fetches /.well-known/oauth-authorization-server on that authorization server to learn the auth, token, and registration endpoints.
  • If the client has never registered, it POSTs to the registration endpoint, gets a client ID, and stores it locally.
  • The client generates a PKCE pair and opens the authorization endpoint in a browser, including the resource parameter set to the MCP server URI.
  • On callback, the client exchanges the code for an access token, again sending the resource parameter.
  • The client retries the MCP request with Authorization: Bearer <token>, and the server validates the audience claim matches its own canonical URI before processing the JSON-RPC payload.

Each of those steps is independently familiar to anyone who has implemented OAuth. The novelty is that the MCP spec is opinionated enough about every choice that two compliant implementations actually interoperate. That has not been true for most OAuth profiles in the wild.

Operational implications most teams miss

Three things change when you treat MCP as a remote protocol instead of a desktop one.

Sticky sessions are now a load balancer problem

A long SSE stream binds a client to one server replica. Round-robin load balancing without session affinity will drop streams across replica restarts. Teams running on Cloudflare Workers or other isolate-based platforms hit this immediately. The durable execution model that Cloudflare shipped for Workflows is one answer, because the durable handle outlives the request, but most teams will solve it with consistent-hash routing on Mcp-Session-Id. Either way, this is a deployment concern that did not exist with stdio.

Token caching is not optional

OAuth 2.1 token endpoints expect short-lived access tokens with refresh-token rotation for public clients. A client that calls a tool ten times in a minute will do ten authorized requests, not ten token exchanges. The client side has to cache. The server side has to expect that the same access token will arrive on many requests and that audience validation has to be fast. JWT validation with cached JWKS is the path of least pain. Opaque tokens with introspection on every call will not survive contact with a real workload.

Observability requires explicit work

Streamable HTTP, by design, mixes request-scoped messages with session-scoped messages on the same connection. A request that emits five progress notifications and one response is six log lines if you log per JSON-RPC message, or one if you log per HTTP request. Neither is wrong, but they answer different questions. Most production deployments end up logging both, correlated by Mcp-Session-Id and JSON-RPC request id. Tracing across the stream is harder than tracing across a normal request-response API, and that is before you add a tool that itself calls out to an LLM.

This technical diagram features a vertical glass cylinder containing horizontal amber and indigo layers.

Where the protocol still has rough edges

The 2025-06-18 spec is the first version that an honest reader can call coherent, but it is not finished. Three gaps come up in real deployments.

Server-to-client requests over GET-opened SSE streams are underspecified in terms of when the server should open them. The spec says a client MAY issue the GET. A server that needs to send a request to the client, for instance a sampling request, has no good answer if the client never opened the listening stream. Some clients keep the GET open permanently. Some open it lazily. Interop is rougher than the request-response path.

Per-stream event ID scoping is correct but operationally painful. If the same session opens parallel streams, each must carry its own cursor. Storing this in Redis or another shared store, keyed by (session_id, stream_id), is straightforward but easy to get wrong. The reference implementations are still catching up on this.

Authorization server discovery assumes the client can talk to the authorization server directly. In a network where the MCP server sits behind a private gateway and the authorization server is on a different domain, the client has to reach both. That is mostly a corporate-network problem, but it is one the spec does not address. Expect proxy patterns and middleware here, similar to what WebMCP does for browser-mediated calls.

What this means for the next year of servers

The teams that shipped stdio-only servers in 2024 are not going to be left behind. The spec preserves stdio as a first-class transport because there is still a real use case for it: local-only tools, sandboxes, anything that wants subprocess lifecycle semantics. What changed is the default. New servers that want a hosted client to pick them up will ship Streamable HTTP first and treat stdio as a deployment option for power users who run their own.

The auth story is the harder migration. A server that was happy reading a GITHUB_TOKEN from its environment under stdio cannot keep that contract under HTTP. It has to advertise an authorization server, validate tokens, and probably broker its own credentials with whatever it is bridging to. That is more code, more failure modes, and more decisions to make. It is also what every other production HTTP API has done for a decade.

The interesting question is whether the registry effects compound. The spec is now opinionated enough that a generic client can talk to a new server it has never seen, walk the metadata, and get a working token. If that holds in practice across a few hundred servers, MCP becomes the first protocol in a long time where a hosted assistant can meaningfully extend itself at runtime without bespoke integration work. The transport rewrite was the precondition. The next year is the test.