Runtime Truth

Thought Leadership

Model Context Protocol Security Risks: The MCP Black Box Problem

MCP tool calls happen at runtime and leave no configuration trail. Learn why the MCP black box problem makes log review and config inspection insufficient for AI agent security.

Obsidian Editorial Team

Security Research

Obsidian Security

May 14, 2026

June 1, 2026

Key Takeaways

Every AI agent your organization deploys connects to MCP servers at runtime, calls tools that were invisible at setup, and moves data across systems in ways no configuration file records.
Your security team cannot govern what it cannot see, and right now it cannot see the layer where those agents actually operate.
That is the model context protocol security risk your current tools are not designed to address.
The tools inside an MCP server resolve at runtime.
The calls an agent makes leave no reconstructable configuration trail.
Every posture-based control your team already owns stops at the boundary where the actual risk lives.

What MCP Actually Is and Why It Changes the Security Equation

Security teams asking what their AI agents can access are asking the right question. The answer increasingly runs through MCP servers, and most teams do not yet have a working model for what that means operationally.

MCP (Model Context Protocol) is an open standard that connects AI agents to external tools, data sources, and services. MCP servers act as the connective tissue between an agent's reasoning layer and the real-world actions it can take. An agent without MCP connections is a chatbot. An agent with MCP connections can read your CRM, write to your code repository, query your data warehouse, and send messages on behalf of your users.

That capability shift is significant. It is also why the security risks of AI agents cannot be addressed with the same approaches used for traditional SaaS security. MCP connections are not static integrations you can audit in a settings panel. They are live, runtime relationships that resolve dynamically every time an agent runs.

The tools an MCP server exposes are declared at runtime. A security team reviewing agent configuration on Monday morning cannot tell you what tools that agent will have access to by Monday afternoon if the MCP server's tool list changes. That is not a theoretical edge case. It is how the protocol works by design.

This creates the foundation of the model context protocol security risk problem: the thing you need to govern does not exist in a form you can review before the agent acts on it.

The MCP Black Box Problem: Why Configuration Stops Short

Security teams trying to govern MCP connections face a structural mechanism problem. The MCP black box problem exists because tools inside an MCP server are only visible at runtime. No configuration snapshot can show you what those tools actually do when called.

Here is the precise sequence that creates the black box:

An agent is configured to connect to an MCP server.
The MCP server exposes a list of available tools to the agent at connection time.
The agent selects and calls tools based on its current task and the instructions it receives.
Each tool call may trigger downstream actions inside SaaS applications, databases, or external APIs.
None of steps 2 through 4 appear in the agent's configuration file.

What your configuration review shows you: the agent connects to this MCP server. What it does not show you: which tools that server exposed, which tools the agent called, what data those calls returned, and what downstream actions followed.

This is the gap between theoretical configuration and effective authority. Theoretical configuration is what the agent is set up to do on paper. Effective authority is what the agent can actually execute inside each connected system after all entitlements resolve. Every posture-based tool stops at theoretical configuration. The model context protocol security risk lives in the space between those two things.

One enterprise discovered over 2,500 agents already running in their environment before any inventory existed. The agents were not the surprise. The surprise was that no one could answer what any of those agents had actually done, which MCP servers they had connected to, or what tools they had called. That is ghost chasing: reviewing configuration signals with no runtime evidence of what actually happened.

AI agents are non-human identities. They hold tokens, inherit credentials, and make decisions autonomously. Unlike service accounts, their blast radius expands dynamically every time they connect to a new MCP server.

Why Retroactive Log Review Cannot Reconstruct What Happened

Security teams that have tried to solve the MCP black box problem with native platform logs already know the result: retroactive log review cannot reconstruct what tool calls an agent made.

The reasons are structural.

Logs are siloed per platform. An agent running in one platform may call an MCP server that writes to a SaaS application governed by a different platform. The action appears in neither log in a way that connects the two events. Correlating them requires manual effort that does not scale past a handful of agents.

Tool call payloads are not logged by default. Even where logs exist, they typically record that a tool was called, not what data the tool returned or what the agent did with that data. The blast radius of a given tool call is invisible in the log record.

MCP server tool lists are not static. A log from last week tells you what tools were available last week. It tells you nothing about what tools are available today. Shadow MCP servers, those deployed without security oversight, may never appear in any log at all.

Action chaining compounds invisibility. When an agent calls Tool A, which triggers a call to Tool B, which writes to a third system, the chain of actions is rarely captured as a single traceable event. Each step may appear in a different log, in a different format, with no shared identifier connecting them.

This is why lateral movement inside SaaS environments is difficult to detect when agents are involved. The movement happens across tool calls, not across login events. Traditional detection logic does not see it.

Obsidian detects model context protocol security risks by integrating directly with AI platforms via native APIs, capturing tool call events as they occur, no connectors to individual SaaS applications required.

How the Visibility Gap Enables Real Attack Patterns

The MCP black box problem is not an abstract governance concern. The visibility gap it creates enables specific, high-severity attack patterns that security teams are already encountering.

Maker Mode Privilege Escalation

An agent built in maker mode uses the creator's credentials for every tool call, regardless of who invokes the agent. A user without Salesforce access can invoke an agent whose creator has Salesforce admin credentials. The agent calls its MCP-connected Salesforce tool using the creator's token. The user receives data they have no right to see. The agent did exactly what it was configured to do. IAM controls were bypassed without a single policy violation.

This is the machine insider scenario. The agent acts like an insider because it holds credentials and makes decisions autonomously. No insider risk program covers it, because those programs are built for humans. The agent's token grants access on possession alone, with no verification of who triggered the call.

Action Chaining Across MCP Connections

A single agent task can trigger a sequence of tool calls across multiple MCP servers. Each call expands the blast radius. An agent asked to summarize a report may call a file-access tool, then a search tool, then a write tool, then an email tool. Each step is individually authorized. The combined sequence can move sensitive data in a way no single-step policy would catch.

Orphaned Agent Persistence

An agent whose creator account has been disabled continues running with the creator's inherited credentials. The MCP connections that agent holds remain active. The tool calls it makes remain authorized. No alert fires because no authentication event occurs. The orphaned agent is a machine insider with no human owner and no active monitoring.

Runtime Truth: The Mechanism That Closes the Black Box

The answer to the MCP black box problem is not better configuration review. Configuration review is exactly what fails. The answer is runtime truth: visibility into what agents actually do at the moment they do it, not what their configuration says they should do.

Runtime truth requires integrating directly with AI platforms via native APIs and webhooks, not reviewing configuration files after the fact. It means capturing tool call events as they happen, correlating them with the identity of the agent, the identity of the invoker, and the entitlements of the connected SaaS application. It means producing a living map of effective authority, not a static snapshot of theoretical configuration.

This distinction matters because probabilistic agents require deterministic guardrails. An AI agent does not follow a fixed script. It reasons over its available tools and selects actions based on context. That means the same agent can take different actions in response to similar inputs. Configuration-based controls cannot anticipate that variability. Only runtime controls, applied at the moment of action, can enforce fixed boundaries on dynamic behavior.

The SaaS supply chain risk that MCP connections introduce mirrors the third-party integration risks that have driven major breaches. The mechanism is the same: a trusted connection used to move data across trust boundaries without triggering authentication alerts. Runtime visibility is the control that closes that gap.

Runtime truth is not a monitoring philosophy. It is a specific technical requirement: integrate with the platform at the API layer, capture each tool call as it fires, correlate it against identity context and effective entitlements, and surface the gap between what configuration permits and what the agent actually did.

Building a Single Pane of Glass Across Your MCP Environment

Security teams cannot govern what they cannot see. That principle is not new. What is new is that the thing they cannot see now includes every MCP server in their environment, every tool those servers expose, every tool call those agents make, and every downstream action those calls trigger.

A single pane of glass for MCP security requires four capabilities working together:

CapabilityWhat It ProvidesWhy Configuration Review Cannot SubstituteMCP server inventoryComplete list of sanctioned and shadow MCP serversShadow servers never appear in configurationTool call visibilityReal-time record of every tool an agent callsTool lists resolve at runtime, not at config timeIdentity correlationLinks the invoker's identity to the agent's effective authorityMaker mode breaks the invoker-to-permission chainBlast radius mappingShows the downstream reach of each tool call chainAction chaining spans multiple systems and logs

The operational goal is AI agent security monitoring that answers four questions for every agent in the environment:

What MCP servers is this agent connected to?
What tools did it call in the last 24 hours?
Whose credentials did it use to make those calls?
What data did those calls access?

Without answers to those four questions, security teams are ghost chasing. They have configuration signals that tell them what could happen. They do not have runtime evidence of what did happen.

Shadow SaaS and shadow MCP servers share the same governance failure mode: both exist outside the visibility of security teams, both carry real access to real data, and both are invisible to every tool that relies on configuration review rather than runtime observation.

Obsidian surfaces this inventory across Salesforce Agentforce, Microsoft Copilot Studio, Amazon Bedrock, Microsoft Azure AI Foundry, ChatGPT Enterprise, and n8n, without requiring a connector for every downstream SaaS application in the stack. The visibility comes from native platform integration, not traffic inspection.

Frequently Asked Questions

What is the MCP black box problem?

The MCP black box problem is the visibility gap created by the fact that tools inside an MCP server are only visible at runtime. Security teams reviewing agent configuration cannot see which tools an MCP server exposes, which tools an agent calls, or what data those calls return. Retroactive log review and configuration inspection are both insufficient for governing AI agent behavior through MCP connections.

Why can't traditional security tools see MCP tool calls?

Traditional security tools are designed around static configuration review and human authentication events. MCP tool calls happen at runtime, resolve dynamically, and span multiple systems without a shared audit trail. The tool list available to an agent can change between reviews, and action chaining across multiple tool calls creates a blast radius that no single log captures.

What is the difference between theoretical configuration and effective authority?

Theoretical configuration is what an agent is set up to do on paper. Effective authority is what the agent can actually execute inside each connected system after all entitlements resolve. An agent may appear to have limited configuration on paper while holding maker mode credentials that grant it admin-level access inside a connected SaaS application. Only runtime visibility reveals the difference.

What is maker mode and why does it matter for model context protocol security?

Maker mode is a configuration where an agent uses the creator's credentials for every tool call, regardless of who invokes the agent. A user without access to a connected system can invoke a maker mode agent and receive data at the creator's privilege level. This bypasses IAM controls entirely and is one of the highest-severity risk patterns in MCP-connected agent environments.

What is an orphaned agent?

An orphaned agent is an agent whose creator or owner account has been disabled, but the agent continues running with the inherited credentials. The MCP connections that agent holds remain active. Because no authentication event occurs, no alert fires. Orphaned agents are machine insiders with no human owner and no active governance.

What does runtime truth mean for MCP security?

Runtime truth means visibility into what agents actually do at the moment they do it, captured via direct integration with AI platforms rather than configuration review. It answers four questions for every agent: what MCP servers is it connected to, what tools did it call, whose credentials did it use, and what data did those calls access. Configuration-based tools cannot produce runtime truth.

Does Obsidian sit inline as an MCP gateway or proxy?

No. Obsidian integrates with AI agent platforms via native APIs and webhooks. It observes agent and tool activity at the application layer without sitting inline as a gateway or proxy between agents and the MCP servers they connect to.