Static code analysis cannot catch hidden MCP tool threats. Learn how runtime observation reveals malicious MCP tools, action chaining, and privilege escalation before damage occurs.
Security teams cannot govern what they cannot see. Most teams today have no complete inventory of which MCP servers their agents connect to, which tools those servers expose, or what those tools are actually authorized to do inside downstream SaaS applications.
The Model Context Protocol is an open standard that lets AI agents connect to external tools, data sources, and services. That openness is its strength for developers. For security teams, it creates a visibility gap that no existing tool was designed to close.
When an agent connects to an MCP server, it receives a list of tool descriptions. Those descriptions tell the agent what each tool does. The agent then decides which tools to call based on the task at hand. In most enterprise environments, the security team sees none of this. They see the agent. They do not see the tools the agent invoked, the data those tools accessed, or whether any of that activity was policy-aligned.
This is the MCP black box problem: an entire layer of AI activity operating between the agent and the SaaS applications it touches, invisible to every security control already in place.
One enterprise security team discovered more than 370 AI agents in their environment during a single assessment. They had no idea those agents existed before the assessment. They had even less visibility into which MCP servers those agents were using. That is the starting condition for malicious MCP tools detection: most teams do not yet know what they need to detect.
Understanding the threat mechanism matters before discussing detection. Malicious tools inside MCP servers do not always announce themselves. They hide in three primary ways.
Tool description poisoning. An MCP server embeds hidden instructions inside a tool's description field. When an AI agent reads that description, it receives not just a functional explanation but a covert directive. The agent, operating probabilistically, may follow that directive without any visible signal to the user or the security team. The attack requires no code modification on the agent side. The payload lives in the server's tool metadata.
Delayed activation. A server can serve clean, benign tool descriptions during initial review or testing. After approval, the server updates its tool descriptions to include malicious instructions. Static code review captured the clean version. The production environment runs the poisoned one. This mirrors traditional SaaS supply chain attacks: the threat enters after trust is established.
Shadow MCP servers. Developers and business users connect agents to unvetted MCP servers outside any approval process. These shadow MCP servers have no security review, often lack authentication, and may expose tools with credentials hardcoded into their configurations. Most enterprise security teams have no awareness that shadow MCP servers exist in their environment at all.
The common thread across all three patterns: the threat is not visible in the tool's source code at the moment of review. It is visible only in what the tool does when an agent calls it.
Static code analysis reads source files. It identifies known vulnerability patterns, hardcoded secrets, and suspicious function calls. For traditional application security, that approach has genuine value.
For malicious MCP tools detection, static analysis has a structural limitation. The behavior of an MCP tool at runtime depends on inputs, external data sources, and server-side logic that no static scan can fully reconstruct. A tool that reads from an external database can return safe data in a test environment and sensitive data in production. A tool description that looks benign in a code file can carry hidden instructions served dynamically from a remote endpoint.
Static analysis frameworks for MCP servers represent meaningful progress. They still cannot capture what happens when a tool executes against live production data, with a real user's credentials, inside a real SaaS application.
Security teams that rely on static review are ghost chasing. They are reviewing theoretical configuration risks that tell them what could happen, with no evidence of what did happen. The question that matters is not "does this tool look dangerous in code?" The questions that matter are: What did this tool actually access? Whose credentials did it use? What data moved?
Answering those questions requires runtime observation, not code review.
Obsidian detects malicious MCP tool activity by integrating directly with AI agent platforms via native APIs, capturing tool call events as they occur without sitting inline as a proxy or gateway between agents and servers.
Runtime truth means observing what agents and tools actually do as they execute, not what configuration says they should do. For MCP tool security, that means capturing tool call activity at the moment it occurs and correlating it against identity context, permission scope, and data sensitivity.
Effective malicious MCP tools detection at runtime looks like this:
Detection SignalWhat It RevealsWhy Static Analysis Misses ItTool call frequency and timingUnusual call patterns suggesting automated data movementNot visible in codeData volume per tool callLarge data movements inconsistent with task scopeDepends on runtime inputsIdentity correlationWhether the invoking user has rights to the data the tool accessedRequires live entitlement resolutionCross-application data pathsTool accessing SaaS app data the agent was never explicitly authorized to reachRequires runtime SaaS contextCredential usageWhether the tool used maker mode credentials to escalate privilegeEmbedded in runtime execution, not config
The critical capability here is correlating the tool's action with the invoking identity's actual permissions inside the downstream SaaS application. A tool call that looks routine in isolation becomes a high-severity event when the invoking user has no legitimate access to the data the tool returned.
This is the effective authority problem. Most tools see theoretical configuration: what the MCP server is set up to do. Runtime truth shows what the tool actually did, on whose behalf, and whether that action was within the bounds of what the invoking identity was authorized to access.
For teams managing n8n workflows or Microsoft Copilot agents, this distinction is not academic. It is the difference between knowing an MCP connection exists and knowing whether that connection moved data it had no business touching.
A single compromised MCP tool does not stay contained. AI agents are designed to chain actions. One tool call leads to another. Data retrieved from one application becomes input for an action in a second application. This is action chaining, and it is how a small initial compromise becomes a large data exposure event.
Consider the sequence. An agent invokes a poisoned MCP tool. The tool returns instructions directing the agent to retrieve records from a connected CRM. The agent, operating as a probabilistic system following the instructions it received, calls the CRM tool. The CRM tool executes using maker mode credentials: fixed credentials belonging to the agent's creator, who has admin-level access. The invoking user has no CRM access at all. The agent does not check. It has no mechanism to check.
The data moves. The blast radius expands with every step in the chain.
This is the machine insider risk that traditional insider risk programs do not cover. The agent holds credentials. The agent takes actions. The agent moves data. No human made the decision to move it. No alert fired on the user's account because the user did nothing wrong by their own access policy. The bearer token the agent used was legitimate. The action was not.
Stopping data movement threats early requires visibility at the tool call layer, not just at the application layer. By the time a data loss prevention tool sees the data leaving, the action chain is already complete.
Effective authority over MCP tools requires three capabilities working together. Inventory comes first. You cannot govern what you cannot see.
MCP server inventory. Build a complete picture of every MCP server your agents connect to, sanctioned and unsanctioned. This includes servers connected through low-code platforms, developer tools, and business-user-built agents. Shadow MCP servers are the most dangerous precisely because they have no security review. A single pane of glass across all agent platforms is the prerequisite for every other security conversation.
Tool call visibility at runtime. Capture what tools are invoked, when, by which agent, on behalf of which identity, and what data those tools accessed. Retroactive log analysis cannot reconstruct MCP tool call sequences with the identity and entitlement context required to assess whether an action was authorized. Runtime observation captures that context as it occurs.
Toxic combination detection. Individual risk factors are medium severity in isolation. An MCP server without authentication is a problem. An MCP server without authentication connected to an agent running in maker mode with admin credentials, accessible to all users in the organization, is a critical-severity toxic combination. Malicious MCP tools detection at scale requires prioritizing these compounding risk patterns, not treating every signal with equal weight.
For teams managing Salesforce Agentforce deployments, this means correlating the agent's tool calls against the Salesforce identity it uses, not just the Salesforce identity of the user who invoked it. Those two identities are frequently different. The gap between them is where privilege escalation lives.
Deterministic guardrails close that gap for probabilistic agents. Probabilistic agents decide what to do based on context and instructions. Deterministic guardrails enforce fixed rules regardless of what the agent decides. The agent cannot call a tool that exceeds the invoking user's permission scope. The rule does not negotiate with the agent's reasoning. That is the control architecture that runtime truth makes possible.
Obsidian detects malicious MCP tools across Salesforce Agentforce, Microsoft Copilot Studio, Amazon Bedrock, Microsoft Azure AI Foundry, ChatGPT Enterprise, and n8n without requiring a connector for each downstream SaaS application. The detection comes from native platform integration, not inline traffic inspection.