Runtime Truth

Threat Explainer

Malicious MCP Tools Detection: Finding Hidden Threats at Runtime

Static code analysis cannot catch hidden MCP tool threats. Learn how runtime observation reveals malicious MCP tools, action chaining, and privilege escalation before damage occurs.

Obsidian Editorial Team

Security Research

Obsidian Security

May 14, 2026

June 1, 2026

Key Takeaways

That pattern repeats across scans of public MCP server populations: a meaningful share contain toxic data flows, and many of those servers score well on conventional security metrics.
Malicious MCP tools detection cannot rely on what a server looks like at setup.
It requires observing what the server actually does at runtime, against real data, under real user credentials.

Why MCP Servers Remain a Black Box for Security Teams

Security teams cannot govern what they cannot see. Most teams today have no complete inventory of which MCP servers their agents connect to, which tools those servers expose, or what those tools are actually authorized to do inside downstream SaaS applications.

The Model Context Protocol is an open standard that lets AI agents connect to external tools, data sources, and services. That openness is its strength for developers. For security teams, it creates a visibility gap that no existing tool was designed to close.

When an agent connects to an MCP server, it receives a list of tool descriptions. Those descriptions tell the agent what each tool does. The agent then decides which tools to call based on the task at hand. In most enterprise environments, the security team sees none of this. They see the agent. They do not see the tools the agent invoked, the data those tools accessed, or whether any of that activity was policy-aligned.

This is the MCP black box problem: an entire layer of AI activity operating between the agent and the SaaS applications it touches, invisible to every security control already in place.

One enterprise security team discovered more than 370 AI agents in their environment during a single assessment. They had no idea those agents existed before the assessment. They had even less visibility into which MCP servers those agents were using. That is the starting condition for malicious MCP tools detection: most teams do not yet know what they need to detect.

How Malicious Tools Hide Inside MCP Servers

Understanding the threat mechanism matters before discussing detection. Malicious tools inside MCP servers do not always announce themselves. They hide in three primary ways.

Tool description poisoning. An MCP server embeds hidden instructions inside a tool's description field. When an AI agent reads that description, it receives not just a functional explanation but a covert directive. The agent, operating probabilistically, may follow that directive without any visible signal to the user or the security team. The attack requires no code modification on the agent side. The payload lives in the server's tool metadata.

Delayed activation. A server can serve clean, benign tool descriptions during initial review or testing. After approval, the server updates its tool descriptions to include malicious instructions. Static code review captured the clean version. The production environment runs the poisoned one. This mirrors traditional SaaS supply chain attacks: the threat enters after trust is established.

Shadow MCP servers. Developers and business users connect agents to unvetted MCP servers outside any approval process. These shadow MCP servers have no security review, often lack authentication, and may expose tools with credentials hardcoded into their configurations. Most enterprise security teams have no awareness that shadow MCP servers exist in their environment at all.

The common thread across all three patterns: the threat is not visible in the tool's source code at the moment of review. It is visible only in what the tool does when an agent calls it.

Why Static Code Analysis Cannot Solve This Problem

Static code analysis reads source files. It identifies known vulnerability patterns, hardcoded secrets, and suspicious function calls. For traditional application security, that approach has genuine value.

For malicious MCP tools detection, static analysis has a structural limitation. The behavior of an MCP tool at runtime depends on inputs, external data sources, and server-side logic that no static scan can fully reconstruct. A tool that reads from an external database can return safe data in a test environment and sensitive data in production. A tool description that looks benign in a code file can carry hidden instructions served dynamically from a remote endpoint.

Static analysis frameworks for MCP servers represent meaningful progress. They still cannot capture what happens when a tool executes against live production data, with a real user's credentials, inside a real SaaS application.

Security teams that rely on static review are ghost chasing. They are reviewing theoretical configuration risks that tell them what could happen, with no evidence of what did happen. The question that matters is not "does this tool look dangerous in code?" The questions that matter are: What did this tool actually access? Whose credentials did it use? What data moved?

Answering those questions requires runtime observation, not code review.

Obsidian detects malicious MCP tool activity by integrating directly with AI agent platforms via native APIs, capturing tool call events as they occur without sitting inline as a proxy or gateway between agents and servers.

Runtime Truth: What Malicious MCP Tools Detection Actually Requires

Runtime truth means observing what agents and tools actually do as they execute, not what configuration says they should do. For MCP tool security, that means capturing tool call activity at the moment it occurs and correlating it against identity context, permission scope, and data sensitivity.

Effective malicious MCP tools detection at runtime looks like this:

Detection SignalWhat It RevealsWhy Static Analysis Misses ItTool call frequency and timingUnusual call patterns suggesting automated data movementNot visible in codeData volume per tool callLarge data movements inconsistent with task scopeDepends on runtime inputsIdentity correlationWhether the invoking user has rights to the data the tool accessedRequires live entitlement resolutionCross-application data pathsTool accessing SaaS app data the agent was never explicitly authorized to reachRequires runtime SaaS contextCredential usageWhether the tool used maker mode credentials to escalate privilegeEmbedded in runtime execution, not config

The critical capability here is correlating the tool's action with the invoking identity's actual permissions inside the downstream SaaS application. A tool call that looks routine in isolation becomes a high-severity event when the invoking user has no legitimate access to the data the tool returned.

This is the effective authority problem. Most tools see theoretical configuration: what the MCP server is set up to do. Runtime truth shows what the tool actually did, on whose behalf, and whether that action was within the bounds of what the invoking identity was authorized to access.

For teams managing n8n workflows or Microsoft Copilot agents, this distinction is not academic. It is the difference between knowing an MCP connection exists and knowing whether that connection moved data it had no business touching.

The Blast Radius of a Compromised MCP Tool

A single compromised MCP tool does not stay contained. AI agents are designed to chain actions. One tool call leads to another. Data retrieved from one application becomes input for an action in a second application. This is action chaining, and it is how a small initial compromise becomes a large data exposure event.

Consider the sequence. An agent invokes a poisoned MCP tool. The tool returns instructions directing the agent to retrieve records from a connected CRM. The agent, operating as a probabilistic system following the instructions it received, calls the CRM tool. The CRM tool executes using maker mode credentials: fixed credentials belonging to the agent's creator, who has admin-level access. The invoking user has no CRM access at all. The agent does not check. It has no mechanism to check.

The data moves. The blast radius expands with every step in the chain.

This is the machine insider risk that traditional insider risk programs do not cover. The agent holds credentials. The agent takes actions. The agent moves data. No human made the decision to move it. No alert fired on the user's account because the user did nothing wrong by their own access policy. The bearer token the agent used was legitimate. The action was not.

Stopping data movement threats early requires visibility at the tool call layer, not just at the application layer. By the time a data loss prevention tool sees the data leaving, the action chain is already complete.

Building Effective Authority Over Your MCP Environment

Effective authority over MCP tools requires three capabilities working together. Inventory comes first. You cannot govern what you cannot see.

MCP server inventory. Build a complete picture of every MCP server your agents connect to, sanctioned and unsanctioned. This includes servers connected through low-code platforms, developer tools, and business-user-built agents. Shadow MCP servers are the most dangerous precisely because they have no security review. A single pane of glass across all agent platforms is the prerequisite for every other security conversation.

Tool call visibility at runtime. Capture what tools are invoked, when, by which agent, on behalf of which identity, and what data those tools accessed. Retroactive log analysis cannot reconstruct MCP tool call sequences with the identity and entitlement context required to assess whether an action was authorized. Runtime observation captures that context as it occurs.

Toxic combination detection. Individual risk factors are medium severity in isolation. An MCP server without authentication is a problem. An MCP server without authentication connected to an agent running in maker mode with admin credentials, accessible to all users in the organization, is a critical-severity toxic combination. Malicious MCP tools detection at scale requires prioritizing these compounding risk patterns, not treating every signal with equal weight.

For teams managing Salesforce Agentforce deployments, this means correlating the agent's tool calls against the Salesforce identity it uses, not just the Salesforce identity of the user who invoked it. Those two identities are frequently different. The gap between them is where privilege escalation lives.

Deterministic guardrails close that gap for probabilistic agents. Probabilistic agents decide what to do based on context and instructions. Deterministic guardrails enforce fixed rules regardless of what the agent decides. The agent cannot call a tool that exceeds the invoking user's permission scope. The rule does not negotiate with the agent's reasoning. That is the control architecture that runtime truth makes possible.

Obsidian detects malicious MCP tools across Salesforce Agentforce, Microsoft Copilot Studio, Amazon Bedrock, Microsoft Azure AI Foundry, ChatGPT Enterprise, and n8n without requiring a connector for each downstream SaaS application. The detection comes from native platform integration, not inline traffic inspection.

Frequently Asked Questions

What is tool poisoning in the context of MCP servers?

Tool poisoning occurs when an MCP server embeds hidden instructions inside a tool's description field. When an AI agent reads that description, it receives covert directives alongside the functional explanation. The agent may execute those directives without any visible signal to the user or security team. The attack exploits the trust the agent places in tool descriptions provided by the server.

Why can't security teams just review MCP server code before allowing connections?

Code review captures a snapshot of the server at one point in time. MCP servers can update their tool descriptions dynamically after initial approval. A server that passes code review can serve poisoned tool descriptions in production. Code review also cannot reveal what data a tool will access when called against live production systems with real user credentials.

What is the difference between a shadow MCP server and a sanctioned MCP server?

A sanctioned MCP server has been reviewed and approved by the security team before agents connect to it. A shadow MCP server is one that developers or business users connected to agents outside any approval process. Shadow MCP servers often lack authentication, may contain hardcoded credentials, and have no security review. They represent the highest-risk category of MCP infrastructure because their tool behavior is entirely unknown to the security team.

How does maker mode amplify the risk of a compromised MCP tool?

In maker mode, an agent uses the fixed credentials of its creator rather than the credentials of the user invoking it. If the creator has admin-level access, every user who invokes the agent effectively operates at admin privilege level, regardless of their own access rights. A compromised MCP tool that triggers action chaining in this context can access data at the creator's full permission scope, not the invoking user's scope.

What does effective authority mean for MCP tool security?

Effective authority is what an agent or tool can actually access inside a SaaS application after all entitlements resolve. It is distinct from theoretical configuration, which is what the agent or tool is set up to do on paper. A tool may appear to have limited scope in its configuration but carry effective authority far beyond that scope through inherited credentials, maker mode connections, or misconfigured OAuth grants.

Is runtime monitoring of MCP tools the same as a network proxy or traffic inspection tool?

No. Runtime monitoring at the tool call layer works by integrating directly with AI agent platforms via native APIs and webhooks. It observes agent and tool activity at the application layer, correlating identity context and entitlement data from connected SaaS applications. It does not sit inline as a network proxy or inspect raw traffic. A network-layer tool cannot see the identity and permission context required to determine whether a tool call was authorized.