Runtime Truth

Threat Explainer

Blocking Unauthorized MCP Tools: From Runtime Detection to Deterministic Enforcement

Learn how runtime detection and deterministic guardrails address unauthorized MCP tool calls, shadow MCP servers, and AI agent privilege escalation in 2026.

Obsidian Editorial Team

Security Research

Obsidian Security

May 26, 2026

June 1, 2026

Key Takeaways

MCP tools are only fully visible at runtime. Configuration files and static inventories cannot tell you what a server actually exposes when an agent connects.
Unauthorized MCP tool calls represent a form of machine insider risk: agents acting with effective authority that exceeds what any human reviewer approved.
Sanctioned versus unsanctioned classification of MCP servers is the prerequisite step before any enforcement conversation can begin.
Runtime observation, not theoretical configuration, is the current state of the art for detecting unauthorized tool usage across enterprise agent platforms.
Deterministic guardrails that restrict which tools an agent can call are the target enforcement state. They are on the near-term roadmap for leading agentic AI security platforms, beginning with Microsoft Copilot, not yet generally available.

Why Unauthorized MCP Tools Are a Distinct Security Risk

Security teams managing AI agents face a problem that traditional IAM was never designed to solve. When a human user accesses an unauthorized resource, there is a clear identity event: a login attempt, an OAuth grant, an access request. When an AI agent calls an unauthorized MCP tool, there is often no equivalent signal. The agent already has a token. The MCP server already has a connection. The tool call happens inside a session that looks, from the outside, entirely legitimate.

This is machine insider risk applied to tool access. AI agents are non-human identities (NHIs) that hold credentials, make decisions, and take actions across SaaS environments. Non-human identities already outnumber human identities by 25 to 50 times in modern enterprises, and that ratio accelerates with every new agent deployment. None of those agents are covered by a conventional insider risk program. None of their tool calls are reviewed by a human approver in real time.

MCP (Model Context Protocol) is the emerging open standard that connects AI agents to external tools, data sources, and services. Its adoption is accelerating. At some enterprises, MCP server counts are doubling quarterly, meaning the inventory problem compounds as fast as deployment velocity grows. So does the shadow MCP problem: servers standing up without security review, tools registering without an inventory record, and agents connecting to both without any team knowing it happened. For a deeper look at how unsanctioned AI connections create similar exposure, see shadow AI security.

The risk is not theoretical. It is structural.

The Black Box Problem: Tools Only Visible at Runtime

Configuration is not runtime truth. That phrase captures the central limitation of every posture-based approach to MCP security.

A security team can review an MCP server's configuration file and see a list of declared tools. What that file cannot tell them is which tools the server actually exposes when an agent connects, whether those tools have changed since the last review, or whether a new tool has been added by a developer who did not update the documentation. The tools inside an MCP server are a black box until an agent opens a live session.

This is not a gap that better documentation closes. It is an architectural property of how MCP works. Tools are registered and surfaced dynamically. An agent that connects to a server at runtime may encounter a tool set that no static analysis captured. Ghost chasing, which means reviewing configuration signals with no runtime evidence of what actually happened, is the inevitable result of relying on posture-only visibility for MCP security.

The practical implication is direct: any inventory of MCP servers that does not include runtime observation is an incomplete inventory. Security teams asking "what tools does this MCP server have?" cannot answer that question from a config file alone. They need to see what the server surfaces when an agent connects. This is the foundation of AI agent visibility that goes beyond configuration snapshots.

Action Chaining and the Blast Radius Problem

Unauthorized tool access becomes exponentially more dangerous when agents chain actions across multiple tools and platforms. Action chaining is the mechanism: an agent takes one action, uses the output to trigger a second action, and continues across tool boundaries until it reaches data or systems far outside the scope of the original task.

Consider a concrete scenario. An agent connects to an MCP server that exposes a file-read tool. The agent reads a file containing API credentials. It then calls a second tool, using those credentials to authenticate against a CRM. It queries customer records. Each individual step may look authorized in isolation. The chain, taken together, represents a privilege escalation that no single access control review would have caught.

This is why blast radius matters as a planning concept, not just an incident response metric. The blast radius of a misconfigured or unauthorized MCP tool connection is not limited to the tool itself. It extends to every downstream action the agent can take once that tool call succeeds. Agents move data at a rate that human users cannot match. The window between an unauthorized tool call and a meaningful data exposure can be very short.

Maker mode compounds this risk. When an agent is built using the creator's credentials, any user who invokes that agent operates with the creator's effective authority, regardless of their own permission level. Add an unauthorized MCP tool to that configuration and the blast radius expands further, because the invoking user now has the creator's permissions across every tool the agent can reach.

How Runtime Detection Works Today

The present-tense capability for governing unauthorized MCP tools is detection, inventory, and flagging. Enforcement at the tool-call level is the target state (covered in the next section). Understanding what detection actually means in practice is essential before evaluating any vendor claim.

Runtime detection works by observing what agents actually do when they connect to MCP servers, not what configuration says they should do. This requires integration with the AI platforms where agents run: native API connections, webhooks, and direct integrations with platforms like Copilot Studio, Agentforce, Bedrock, Vertex AI, and ChatGPT Enterprise. The observation layer captures tool calls as they occur, correlates them against a sanctioned inventory, and flags calls to tools or servers that fall outside approved boundaries.

The sanctioned versus unsanctioned classification is the operational core of this capability. A sanctioned MCP server is one that security has reviewed, approved, and added to the inventory. An unsanctioned server is anything else: a server a developer stood up without review, a third-party integration that connected without IT knowledge, or a shadow MCP that appeared in the environment without a corresponding approval record. The MCP server inventory is the prerequisite. Without it, the classification has no baseline to work from.

What runtime detection produces today:

A live record of which agents connected to which MCP servers during a given session
A list of tool calls made during that session, including tools not present in the static configuration
Flags for tool calls that match unsanctioned server records or exceed the agent's declared scope
Correlation between the invoking identity (the runner) and the credentials the agent used (the maker's effective authority)

This is runtime truth: evidence of what actually happened, not a theoretical configuration risk. One enterprise security team described their prior approach as reviewing configuration with no runtime evidence of what happened. Runtime detection replaces that with an audit trail that shows what the agent did, what tools it called, and what data it accessed. For a broader view of how this fits a security program, see AI agent governance.

From Detection to Deterministic Enforcement: The Target State

Detection answers the question "what happened?" Enforcement answers the question "what do we prevent from happening?" The industry is actively building toward the second answer in 2026, and security teams evaluating platforms need to understand the distinction between what is available today and what is on the roadmap.

Deterministic guardrails for AI agents are the enforcement model that matches the risk profile. Probabilistic agents, by design, do not follow fixed paths. They make decisions based on context, instructions, and tool availability. The access controls governing those agents cannot be probabilistic in the same way. They need to be fixed, predictable rules that apply regardless of what the agent decides to do.

The target state for blocking unauthorized MCP tools looks like this: when an agent attempts to call a tool that falls outside its approved tool set, the call is intercepted and rejected before it completes. The enforcement happens at the platform level, not at the network layer. It does not require an inline proxy sitting between the agent and the MCP server. It requires hooks deep enough into the AI platform to observe and act on tool calls before they execute.

This enforcement capability is on the near-term roadmap, with Microsoft Copilot targeted as among the earliest platforms supported (late Q1 2026) and additional platforms following during Q2 2026. Specific timing and platform sequencing should be confirmed with the product team before any external commitment. Security teams should evaluate vendors on the architecture of their enforcement approach, not just the presence of a "block" feature in a marketing brief.

The architectural requirement matters. Enforcement that works by sitting inline as a network proxy introduces latency, creates a single point of failure, and requires routing all agent traffic through a chokepoint. The correct architecture hooks into the AI platform's native execution layer, applies deterministic rules against the sanctioned tool inventory, and rejects unauthorized calls without becoming a dependency in the agent's communication path. Obsidian is a control plane and identity graph, not an MCP proxy or inline traffic filter. It enforces at the tool-call level through native platform integrations, leaving the agent's communication path intact.

For context on how the broader AI agent risk landscape maps to enforcement requirements, the AI agent risk assessment provides a useful starting framework.

Building the Foundation: What Security Teams Need Now

The path from "we have no MCP visibility" to "we enforce deterministic guardrails on every tool call" runs through a specific sequence. Skipping steps in that sequence produces enforcement that has no reliable baseline to enforce against.

Step 1: Build the MCP server inventory. Identify every MCP server active in the environment. Classify each as sanctioned or unsanctioned. This is not a one-time exercise. MCP server counts are growing quarterly at many enterprises, and the inventory needs continuous updates as new servers come online.

Step 2: Map effective authority, not theoretical configuration. For each agent connected to an MCP server, determine what tools that server actually exposes at runtime, what credentials the agent uses to make those calls, and whether the invoking user's identity aligns with the access level the agent operates at. This is the effective authority question: not what the config says, but what the agent can actually do.

Step 3: Classify tool calls against the sanctioned inventory. Once runtime observation is in place, every tool call becomes a data point. Calls to sanctioned tools with expected parameters are baseline behavior. Calls to unsanctioned tools, calls with unexpected parameters, or calls that chain into downstream systems outside the agent's declared scope are flags that require investigation.

Step 4: Define the enforcement policy before enforcement is available. Security teams that wait until enforcement capability ships to define their policies will spend the first weeks of enforcement tuning rules reactively. The better approach is to define the tool allowlist for each agent class now, using the runtime observation data already available. When enforcement rolls out, the policy is ready to apply.

The single pane of glass across all agent platforms, all MCP servers, and all tool calls is the operational requirement that makes this sequence possible. Without it, each platform becomes its own investigation, and the correlation between an agent's runner identity, maker credentials, and tool call history requires manual work that does not scale. For a broader view of how this connects to the platform, see AI agent security.

Frequently Asked Questions

What makes an MCP tool "unauthorized" in an enterprise context?

An unauthorized MCP tool is any tool exposed by an MCP server that has not been reviewed and approved by the security team, or any tool call that exceeds the scope the agent was configured to perform. This includes tools on unsanctioned servers, tools added to a sanctioned server without a change review, and tools called with credentials that exceed the invoking user's own permission level.

Can static code analysis or configuration review replace runtime MCP tool monitoring?

No. MCP tools are surfaced dynamically at runtime. A configuration file or static review captures what a server declares, not what it actually exposes when an agent connects. Runtime observation is required to build an accurate tool inventory.

What is the difference between detecting unauthorized tool calls and blocking them?

Detection identifies and flags unauthorized tool calls after they occur or as they occur, producing an audit trail and alert. Blocking prevents the tool call from completing. Detection is the current capability. Blocking via deterministic guardrails at the platform level is the target enforcement state, on the near-term roadmap beginning with Microsoft Copilot.

Why is maker mode a specific risk factor for MCP tool access?

In maker mode, an agent uses the creator's credentials for all tool calls, regardless of who invokes the agent. A user with limited permissions who invokes a maker-mode agent gains effective access at the creator's privilege level. If that agent connects to an MCP server with broad tool access, the invoking user can reach tools and data they were never authorized to access directly.

Does blocking unauthorized MCP tools require routing agent traffic through a central proxy?

The correct enforcement architecture does not require an inline network proxy. Enforcement that hooks into the AI platform's native execution layer can apply deterministic rules at the tool-call level without becoming a dependency in the agent's communication path. An inline proxy approach introduces latency and creates a single point of failure that a native control plane approach avoids.

How does action chaining increase the risk of a single unauthorized tool call?

A single unauthorized tool call may produce limited exposure on its own. But agents can use the output of one tool call as input to the next, chaining actions across tools and platforms. Each step in the chain can expand the blast radius: reading credentials, authenticating against a downstream system, querying records, and moving data. The risk of one unauthorized tool call is therefore the risk of every action that call enables downstream.