Runtime Truth

Thought Leadership

Manual MCP Analysis Limits: Why Runtime Visibility Wins at Scale

Ninety percent of AI agents running in enterprise environments today hold more permissions than their workflows actually require.

Obsidian Editorial Team

Security Research

Obsidian Security

May 21, 2026

June 1, 2026

Key Takeaways

Ninety percent of AI agents running in enterprise environments today hold more permissions than their workflows actually require.
What they cannot tell you is what those agents did with those permissions between the last manual review and right now.
That gap is where the manual MCP analysis limits problem lives.
Point-in-time configuration review tells security teams what an MCP server is set up to do.
It says nothing about what tools fired at 2:14 AM, what data paths those tool calls traversed, or whether the agent invoking them had any business doing so.
And in 2026, with enterprises running hundreds of AI agents across multiple platforms simultaneously, the gap between configuration and runtime truth has become the most consequential visibility problem in enterprise security.---

What Manual MCP Analysis Actually Covers (And What It Misses)

Security teams approaching MCP server governance for the first time typically start the same way: pull the configuration files, review the tool descriptions, check the declared permissions, and document what they find. This process feels thorough. It produces a spreadsheet. It satisfies an audit request. It does not produce security.

Manual MCP analysis covers the theoretical configuration layer. It shows which tools an MCP server declares, what permissions the server requests at registration, and what the tool descriptions say those tools are supposed to do. That is useful context. It is not runtime truth.

What manual analysis cannot see:

Which tools actually fire during a live agent session, as opposed to which tools are registered
What arguments get passed to each tool call, including any sensitive data embedded in those arguments
Which downstream systems the agent reaches through chained tool calls
Who invoked the agent and whether that invoker's identity aligns with the permissions the agent exercises
Whether tool descriptions match tool behavior at runtime, a gap that only surfaces during execution

This last point deserves attention. Tool descriptions in MCP servers are text. They are written by developers, often quickly, and they describe intended behavior. Intended behavior and actual runtime behavior diverge. A tool described as reading calendar availability may, at runtime, also traverse organizational directory data depending on how the underlying API resolves the request. Manual review of the description tells you nothing about that traversal. Security researchers now treat that divergence as a systemic risk category, not an edge case.

To understand how AI agent security addresses these layered risks, the mechanism matters more than the label.

Why Manual MCP Analysis Limits Break at Scale

One enterprise security team can manually review ten MCP servers with reasonable thoroughness. At thirty servers, the process becomes a part-time job. At two hundred servers, it becomes theater.

The manual MCP analysis limits problem is not just a time problem. It is a combinatorial problem. Each MCP server can expose multiple tools. Each tool can be invoked by multiple agents. Each agent can chain multiple tool calls in a single session. The number of possible runtime execution paths grows faster than any team can track manually.

Consider what enterprises actually face in 2026:

Environment SizeMCP ServersTools per ServerPossible Tool-Call PathsSmall (20 agents)15-305-15 eachThousandsMid-size (100 agents)80-15010-20 eachHundreds of thousandsEnterprise (500+ agents)300+15-30 eachMillions

No spreadsheet governs millions of possible execution paths. No quarterly review cycle catches a privilege escalation that happened on Tuesday.

The scale problem compounds further because modern AI platforms auto-generate MCP tools from existing APIs and documentation. A developer connects a new data source, and the platform creates tool definitions automatically. Security teams receive no notification. The manual review queue grows without any human decision triggering it. By the time a review occurs, the tool has already been invoked thousands of times in production.

One enterprise discovered more than 370 AI agents running in their environment through an assessment. They had no prior inventory. Every one of those agents potentially connected to MCP servers that no security team had reviewed. That is not a gap in process. That is a structural impossibility for manual analysis.

The machine identity sprawl underlying this problem follows the same pattern. Non-human identities now outnumber human identities 25 to 50 times in modern enterprises, and that ratio accelerates with every new agent deployment.

The Ghost Chasing Problem: Configuration Signals Without Runtime Evidence

Security teams reviewing MCP configurations without runtime data are ghost chasing. They are chasing theoretical risks that may never materialize, while the actual risks, the ones that already happened, leave no trace in a configuration file.

Ghost chasing looks productive. It produces findings. It generates remediation tickets. It creates the appearance of a security program. What it does not produce is evidence of what actually occurred.

The question security teams need to answer is not what could this MCP server do. The question is: what did it do? Who triggered it? What data did it touch? Did the invoker have any right to that data?

Configuration review answers none of those questions. It answers a different question entirely: what is this MCP server set up to do? That question matters at deployment time. It stops mattering the moment the agent goes live and starts making decisions based on runtime context, user input, and action chaining that no configuration document anticipated.

The AI agent security problem is fundamentally a runtime problem because agents are probabilistic systems. They do not follow a fixed script. They respond to context. A probabilistic agent given a broad set of MCP tools will use different tools in different sequences depending on what the user asks, what the previous tool call returned, and what the model decides to do next. No static analysis predicts that sequence. Only runtime observation records it.

This is why security teams that rely on manual MCP analysis limit their own visibility to the pre-deployment moment, which is the moment of least risk. The blast radius of an agent grows after deployment, not before. Reviewing configuration at deployment and then moving on is the equivalent of checking a car's fuel level before a trip and assuming it never changes.

Runtime Truth: What Continuous Observation Actually Shows

Runtime visibility into MCP tool calls produces a fundamentally different class of security intelligence. Instead of a snapshot of declared capabilities, it produces a continuous record of actual behavior.

Tool invocation patterns. Which tools fire most frequently, which fire rarely, and which never fire at all. A tool that never fires in production but holds broad permissions is a dormant blast radius waiting for a trigger.
- What this control enforces: accurate least-privilege scoping based on observed usage, not declared capability
- Why configuration review fails: it cannot distinguish a tool that is theoretically accessible from one that is actively used against sensitive data

Argument-level data. What values get passed into each tool call, including sensitive data embedded in those arguments. An agent passing customer records as an argument to a search tool is not visible in any configuration file. It is only visible at runtime.
- What this control enforces: detection of sensitive data traversal paths that only materialize during execution
- Why configuration review fails: tool descriptions do not capture the data payloads that pass through them at runtime

Cross-platform action chains. When an agent calls Tool A, receives a result, then calls Tool B in a different system using that result, the chain only exists as a runtime artifact. Configuration documents do not describe action chains because chains are emergent behaviors of the agent's decision-making, not declared workflows.
- What this control enforces: end-to-end sequence visibility that spans platforms and reveals compound data movement
- Why per-platform log review fails: each platform sees its leg of the chain; no platform sees the full sequence

Identity correlation. Who invoked the agent, what identity the agent executed under, and whether those two identities align with authorized access patterns. This is the core of machine insider risk detection. An agent executing with an admin-level service account on behalf of a user who has no admin rights represents a privilege escalation. That escalation is invisible to configuration review and visible only at runtime.
- What this control enforces: the correlation between invoker identity and agent effective authority that reveals confused deputy and maker mode attacks
- Why static review fails: the invoker's identity is a runtime variable that no configuration file records

Orphaned agent activity. Agents whose creator accounts are disabled continue running with inherited credentials. Manual review of a configuration file shows the agent exists. Runtime observation shows it is still active, still making tool calls, and still operating with credentials tied to an account that no longer has a human owner.
- What this control enforces: active detection of credential usage tied to disabled accounts
- Why configuration-only review fails: an agent can appear correctly configured while its owner account has been deprovisioned, leaving no flag in the policy file

Runtime observation also surfaces the toxic combinations that make individual risk factors critical. A shadow agent with org-wide access and an active connection to a sensitive data source is a critical-priority risk. Each factor alone is medium severity. The combination, visible only when all three signals appear simultaneously in runtime data, is the signal that actually warrants immediate response.

From Visibility to Effective Authority: The Difference That Matters

Most tools that address AI agent risk show theoretical configuration: what an agent is set up to do. Effective authority is different. Effective authority is what the agent can actually execute inside each connected application after all entitlements resolve.

The difference between theoretical configuration and effective authority is the difference between a policy document and a live access audit. A Salesforce agent configured to read opportunity records may, in practice, hold effective authority over the entire CRM database because the service account it runs under holds administrator-level permissions. The configuration says read opportunity records. The effective authority says full CRM access. Manual review of the configuration produces the wrong answer.

Closing this gap requires correlating agent configuration with actual SaaS entitlements, identity context, MCP infrastructure, and real-time behavior into a single picture. That picture cannot be assembled from configuration files alone. It requires a living map that updates as agents act, as credentials change, and as new MCP connections form.

This is what a single pane of glass for non-human access actually means in practice: not a dashboard that aggregates configuration data from multiple platforms, but a correlated view of what every agent can actually do, on whose behalf, with what downstream reach, updated continuously as runtime behavior unfolds.

The AI agent governance programs that get this right start from effective authority, not theoretical configuration.

Ready to move past manual review? Request an AI agent risk assessment and see what your agents are actually doing across your environment.

What Runtime Visibility Solves That Manual Analysis Cannot

The argument for runtime visibility over manual MCP analysis is not that manual review is worthless. Pre-deployment review has a role. The argument is that manual review stops being sufficient the moment an agent goes live, and enterprises need a continuous observation layer that operates at machine speed across every agent, every MCP server, and every tool call simultaneously.

Runtime visibility solves five specific problems that manual MCP analysis limits cannot address:

1. Shadow MCP server discovery. MCP servers that appear without security team awareness are invisible to any review process that requires knowing what to review. Runtime observation catches connections as they form, not after a quarterly audit surfaces them.

2. Action chain reconstruction. When an incident occurs, security teams need to know exactly what the agent did, in what sequence, with what data. Runtime logs provide that reconstruction. Configuration files provide a starting hypothesis that may bear no resemblance to what actually happened.

3. Maker mode privilege escalation detection. When an agent runs under its creator's credentials and a lower-privilege user invokes it, the privilege escalation happens at runtime. The configuration may show the maker mode connection. Only runtime correlation shows that the invoker had no right to the data the agent retrieved.

4. Orphaned agent identification. An agent whose owner account is disabled continues operating. Runtime activity from a credential tied to a disabled account is a detectable signal. A configuration review that checks owner status without runtime activity data cannot tell whether the orphaned agent is dormant or actively moving data.

5. Least privilege enforcement targeting. Deterministic guardrails for probabilistic agents require knowing what agents actually do before restricting what they can do. Runtime observation produces the behavioral evidence that makes least privilege enforcement accurate rather than arbitrary.

Security teams that want to move from ghost chasing to genuine AI agent monitoring need runtime observation as the foundation. Everything else, inventory, risk scoring, access control, audit trails, builds on top of knowing what agents actually did.

Boards and CISOs increasingly demand this evidence. Reporting on AI exposure with configuration snapshots does not answer the question they are asking. Runtime truth does.

Frequently Asked Questions

What is the core limitation of manual MCP analysis?

Manual MCP analysis reviews configuration state at a single point in time. It shows what an MCP server is set up to do, not what it actually does during live agent sessions. Runtime tool-call behavior, argument-level data, and cross-platform action chains are invisible to any static review process.

Why do manual MCP analysis limits become critical at enterprise scale?

Each MCP server exposes multiple tools. Each tool can be invoked by multiple agents in different sequences depending on runtime context. The number of possible execution paths grows combinatorially. Manual review of hundreds of MCP servers produces false confidence because humans cannot predict actual runtime tool-use distributions from configuration files alone.

What is ghost chasing in the context of MCP security?

Ghost chasing describes the practice of reviewing theoretical configuration risks without any runtime evidence of what actually occurred. Security teams chase signals that indicate what could happen while missing the evidence of what already happened. Runtime observation replaces ghost chasing with a factual record of agent behavior.

What is effective authority and why does it differ from theoretical configuration?

Effective authority is what an agent can actually execute inside a connected SaaS application after all entitlements resolve. Theoretical configuration is what the agent's setup document says it should be able to do. An agent configured for limited read access may hold effective authority over an entire database if its underlying service account carries admin-level permissions. Only runtime correlation of agent behavior with actual SaaS entitlements reveals the difference.

What specific risks does runtime visibility detect that manual review misses?

Runtime visibility detects shadow MCP server connections, maker mode privilege escalation, orphaned agent activity, argument-level sensitive data exposure, and cross-platform action chains. None of these appear in configuration files because all of them are emergent behaviors that only exist during agent execution.

How does runtime visibility support deterministic guardrails for AI agents?

Deterministic guardrails enforce fixed, predictable rules on probabilistic agents. Building accurate guardrails requires knowing what agents actually do in production, not what their configuration suggests they might do. Runtime observation produces the behavioral evidence that makes guardrail targeting precise. Without it, access restrictions are arbitrary and likely to block legitimate workflows while missing actual risks.