Using Microsoft Entra ID To Authenticate With MCP Servers Via Sessions

Table of Contents

Because what you’re about to read deals with authentication and authorization, I want to call out that this blog post is intended for demo purposes. You will need to make some changes to make the code you see here production-ready.

Just a week ago I was talking about an approach to authenticating into MCP servers with Entra ID. While the approach was OK as a prototype, it had some interesting aspects to it that might or might not work depending on the context. It depended on the MCP server acting as a public client on behalf of the MCP client. Alas, such is the current implementation of the spec in the TypeScript SDK. But what if we could improve this a bit?

The key is in the session #

The entire flow that I described earlier depended on the fact that we want to pass an access token from Entra ID to “unenlightened” (Entra ID-unaware) clients. But what if we didn’t? What if instead of the Entra ID access token, we issue a “session token” - still a JWT that is designed to be consumed by the client, that doesn’t have any references to Entra ID. This was alluded to in the spec:

6. MCP server generates its own access token bound to the third-party session
7. MCP server completes original OAuth flow with MCP client
Model Context Protocol Team https://spec.modelcontextprotocol.io/specification/draft/basic/authorization/#292-flow-description

If this rings a bell, that’s because it’s very similar to the concept of Backend-for-Frontend (BFF) and how it handles authentication tokens. Because in our scenario the client is completely in the dark about the auth infrastructure we have, we can’t have the client complete the end-to-end auth flow. In this scenario, we delegate that to the server. But we also don’t want the server to act as a public client, because it’s not - we want it to be a confidential client. But confidential clients must not ever forward their tokens back to the client.

That’s a long way of explaining that we want to keep access tokens on the server, and give the client the ability to access protected resources through some kind of session indicator. And because we don’t know that the MCP client is running in a web browser, we can’t quite rely on cookies, as one typically would for the BFF implementation.

Essentially, if we think in sequence diagrams, the flow we’d get is this:

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f1f1f', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff' }}}%% sequenceDiagram participant UA as Web Browser participant Client as MCP Client participant BFF as MCP Server participant Auth as Identity Provider participant Backend as Protected API Client->>BFF: Initial request BFF-->>Client: 401 Unauthorized Client->>BFF: /authorize BFF->>BFF: Generate session ID BFF-->>Client: Redirect to IdP authorization endpoint Client-->>UA: Open authorization endpoint UA->>BFF: Redirect after authenticating the user BFF->>Auth: Exchange the authorization code for token Auth-->>BFF: Access token BFF->>BFF: Cache token along with session ID BFF-->>Client: Return JWT with session and metadata Client->>Client: Cache session token Client->>BFF: Tool request with session as bearer token BFF->>BFF: Validate token BFF->>Backend: Request data with Entra ID token (bound to session) Backend-->>BFF: Return data BFF-->>Client: Return data

Simple, right? In the context of this flow, I also built another prototype that demonstrates how to explore a potential implementation, available on GitHub.

Check out the code

The demo is effectively a “clone” of the previous implementation, except that I am no longer using any public client logic here. You’ll notice that the MCP server authentication provider is quite different. It implements its own PKCE checks in addition to doing the same for the codes that are obtained from the client.

Wait… Hold on a second… Two PKCE checks. Why is that even necessary here? If the server is doing everything on its own, why bother with two?

Great question! When the client talks to the MCP server’s /authorize endpoint, it does so with the standard PKCE components, like the code challenge. Those components are entirely irrelevant to what we’re doing server-side because our auth flow is detached from the client. When the confidential client constructs its authentication URL, it needs to make sure that when Entra ID returns the authorization code to the server (remember - in our case, the Entra ID token never goes back to the client) the server is able to then request a token with its own verifier rather than the one from the MCP client, since the MCP client is not the one doing the token exchange here.

At the same time, the client is talking to the MCP server as if it’s an OAuth provider in its own merit (even though the server talks to Entra ID behind the scenes), and we need to make sure that before we give the requesting client the token, we try and match up the verifier with the challenge that was given to us earlier.

A quick peek at the implementation #

To give what I explained above a bit more color, let’s take a look at the new and improved implementation for authorize():

/**
 * Authorizes a client and redirects to Entra ID login
 * @param client - Client information
 * @param params - Authorization parameters
 * @param res - Express response object
 */
async authorize(client: OAuthClientInformationFull, params: AuthorizationParams, res: Response): Promise<void> {
    console.log("Authorizing client ", client.client_id);

    try {
        const redirectUri = client.redirect_uris[0] as string;

        // Generate our own PKCE values instead of using client's
        const pkce = this.generatePkce();
        const codeChallengeMethod = 'S256';

        // Generate a secure random state parameter
        const state = crypto.randomBytes(32).toString('hex');

        // Store both the client's original state and our generated state
        const sessionData: SessionData = {
            clientId: client.client_id,
            state: state,
            codeVerifier: pkce.verifier,  // Store our verifier for later
            redirectUri: redirectUri,
            originalState: params.state as string,  // Store client's original state
            clientCodeChallenge: params.codeChallenge as string,
            clientCodeChallengeMethod: 'S256'
        };

        await this._storeSessionData(state, sessionData);

        const authCodeUrlParameters = {
            scopes: ['User.Read'],
            redirectUri: 'http://localhost:3001/auth/callback',
            codeChallenge: pkce.challenge,  // Use our challenge
            codeChallengeMethod: codeChallengeMethod,
            state: state,
            prompt: 'select_account'
        };

        const authUrl = await this._confidentialClient.getAuthCodeUrl(authCodeUrlParameters);

        res.redirect(authUrl);

    } catch (error) {
        console.error("Authorization setup error:", error);
        res.status(500).send("Failed to initialize authentication: " + error);
    }
}

Not only do we redirect the MCP server customer to Entra ID for authentication, but we also now track the session on initialization. This will help us later. Notice that the redirect URI I am using here is no longer the client - it’s the server, where I have a special callback, responsible for processing the auth code and state that Entra ID will return:

app.get(
  "/auth/callback", 
  (req: Request, res: Response, next: NextFunction): void => {
    const { code, state } = req.query;
    
    if (!code || !state || Array.isArray(code) || Array.isArray(state)) {
      res.status(400).send("Invalid request parameters");
      return;
    }
    
    provider.handleCallback(code as string, state as string)
      .then((result) => {
        if (result.success) {
          res.redirect(result.redirectUrl);
        } else {
          res.status(400).send(result.error || "Unknown error");
        }
      })
      .catch((error) => {
        console.error("Error in callback handler:", error);
        res.status(500).send("Server error during authentication callback");
        next(error);
      });
  }
);

This, in turn, gets to the handleCallback implementation on the provider side, which will use the returned artifacts (code and state, match with the verifier) and exchange them for the access token.

My code uses in-memory caching for both sessions and tokens. This is, clearly, not production-ready - you should never use default unencrypted in-memory caching for production scenarios in multi-user environments.

We will then return a temporary authorization code and state to the client (after all, we’re a mini OAuth provider on the server), that will then be used for the session token exchange, handled by the /token, which will in turn invoke exchangeAuthorizationCode, which in my implementation will verify the authorization code, get the session ID, and wrap it in a JWT “session token” and return it back to the client.

The JWT here can have any information we want, including client data, that allows us to make sure that we verify both the origin and session assignment. We can add more claims, if needed, but for demo purposes what I have is enough.

With the implementation above, the client never has access to any of the Entra ID access tokens, and the only way to get to one is via a properly assigned session, which on the MCP server side would be associated with a user and client combination.

As an added bonus, I no longer do on-behalf-of within the tool code, because we have a proper access token that can be obtained from the session-to-token map.

Connecting to an authenticated MCP server through MCP Inspector.

Looking a bit forward #

All of this is a bit confusing and you can see some concepts start bleeding through between client and server. This is because the server is seen as a full OAuth provider, proxy or not. If the client would perform the end-to-end authentication flow directly with the identity provider (using the same PKCE mechanism), we won’t even need to have this dual hop!

But what would that approach be? If the client is “unenlightened” and has no concept of the identity provider, what would the right approach be?

One alternative suggested by a colleague of mine, Will Bartlett, is to have clients support WWW-Authenticate header parsing. In this context, a protected MCP server may still return a proper 401 Unauthorized response, but with a WWW-Authenticate header in tow that points to the realm or the authority. From there, the OIDC discovery document can be crafted by using one of the commonly-adopted patterns, such as /.well-known/openid-configuration, which in the Entra ID case, would point the client directly to its OIDC discovery metadata document.

Armed with this data, the client can now perform the full, local, browser-based, PKCE-supported public client authentication flow without ever touching the MCP server. All the server can ever do is tell whether a token is needed (a request came in without an Authorization header) or whether the current token is not suitable (e.g., it needs different claims, is expired, or is otherwise invalid).

Conclusion #

Authentication with MCP servers is still in the early design stages - all of this prototyping might be irrelevant by next week, but it’s a fun domain to explore and see where there is potential for improved integration. Entra ID works remarkably well with existing OAuth standards, so to get it to work out-of-the-box is not extremely complicated.