All ArticlesRuntime Truth
Runtime Truth
Feature Blog

Monitor AI Agent Tool Calls in Real Time: MCP Tool Call Monitoring

MCP tool call monitoring captures every agent action at runtime: which tool was called, what data returned, and on whose behalf. Stop ghost chasing configuration signals.

Obsidian Editorial Team
Security Research
·
Obsidian Security
·
May 27, 2026
May 28, 2026
Key Takeaways
  • Configuration reviews tell you what an agent is set up to do.
  • Runtime tool call monitoring tells you what it actually did.
  • Those two pictures rarely match, and the gap between them is where your highest-severity AI agent risks live.
  • Ninety percent of AI agents running in enterprise SaaS environments today hold more permissions than their workflows require.
  • Yet most security teams cannot tell you which tools those agents called last week, what data came back, or on whose behalf the calls were made.
  • That is the visibility gap that monitoring AI agent tool calls in real time exists to close.

Why MCP Tool Calls Are the New Security Blind Spot

Security teams know they have an agent problem. One enterprise discovered 377 Copilot agents through an assessment they did not commission. Another found 2,500 agents already running before any inventory existed. A third watched its MCP server count in a single coding tool double quarter over quarter with no corresponding security review.

The Model Context Protocol (MCP) is the open standard that connects AI agents to external tools, data sources, and services. When an agent needs to query a database, write to a CRM record, call an API, or read a file, it does so through a tool call routed via MCP. Each of those calls is a discrete runtime event. Each one carries real inputs, returns real data, and executes under a real identity's authority.

The problem is straightforward: most security tools see the MCP connection, not the calls flowing through it. They can tell you that an agent has a connector to Salesforce. They cannot tell you that the agent called the get_opportunity_records tool 47 times yesterday, returned pipeline data for every enterprise account, and did so on behalf of a user who has no Salesforce license.

That is the difference between theoretical configuration and effective authority. Monitoring AI agent tool calls in real time starts with closing that gap: moving from knowing a connection exists to knowing what flowed through it.

What MCP Tool Call Monitoring Actually Captures

MCP tool call monitoring is not the same as knowing an MCP server exists. It is per-call visibility into the live execution layer of every agent action. Each monitored call produces a structured record containing four critical data points:

Data PointWhat It RevealsTool invokedWhich specific capability the agent exercised (e.g., search_crm, send_email, read_file)Input parametersWhat the agent sent to the tool, including any sensitive query terms or data identifiersData returnedWhat the tool sent back, including records, files, or API responsesInvoking identityWhich user or agent identity triggered the call and under what credential context

Together, these four fields build an audit trail that no configuration review can produce retroactively. If an agent called a file-read tool and returned a document tagged with a sensitive information protection label, that event is recorded. If the invoking identity is a user without access to that document class, the record surfaces a privilege escalation that the configuration never flagged as possible.

This is runtime truth. The agent's configuration may show a standard connector with medium-risk OAuth scopes. The tool call log shows it returned hundreds of records of restricted financial data in a single session.

For teams managing Microsoft Copilot or Salesforce Agentforce deployments, this per-call record is the difference between knowing an agent exists and knowing what it has done.

Obsidian captures this per-call record across Salesforce Agentforce, Microsoft Copilot Studio, Amazon Bedrock, Microsoft Azure AI Foundry, ChatGPT Enterprise, and n8n by integrating via native APIs, with no connector required for each downstream SaaS application.

Tool Call Monitoring vs. Broader AI Runtime Monitoring

These two concepts are related but distinct. Security teams need both, and conflating them creates coverage gaps.

Broader AI runtime monitoring answers the macro question: which agents are running, on which platforms, with what configurations, and what is their overall risk posture? It produces a single pane of glass across platforms. It surfaces orphaned agents whose owners have been disabled, shadow agents deployed without IT oversight, and agents with org-wide access that should be restricted.

MCP tool call monitoring answers the micro question: within a given agent session, what specific actions did the agent take, in what sequence, with what data? It operates at the individual call level, not the agent configuration level.

Think of it this way: runtime monitoring tells you a car exists, who owns it, and whether the registration is current. Tool call monitoring tells you where it drove, how fast, and who was in the passenger seat.

Both layers are necessary. An agent that looks clean at the configuration level can still execute a sequence of tool calls that chains across systems to reach data it should never touch. That sequence, called action chaining, is invisible without per-call visibility. Each individual call may look benign. The chain reveals the blast radius.

AI agent governance frameworks require both layers to produce the audit-ready evidence that compliance and security programs demand. Runtime monitoring provides the inventory and posture picture. Tool call monitoring provides the behavioral record.

The Machine Insider Risk Hidden in Every Tool Call

Every MCP tool call executes under an identity. That identity is not always the user who triggered the agent. This is where machine insider risk becomes concrete.

Consider maker mode: an agent built by a Salesforce administrator using the administrator's credentials as the fixed connector authentication. Any user who invokes that agent, regardless of their own Salesforce permissions, executes tool calls under the administrator's authority. A user with no CRM access can invoke the agent, trigger a get_account_data tool call, and receive records they were never provisioned to see. The agent did exactly what it was configured to do. IAM controls were bypassed completely.

Without tool call monitoring, this event is invisible. The configuration shows a valid connector. The IAM logs show the user never accessed Salesforce directly. The tool call log is the only record that correlates the invoking user identity, the agent's maker mode credentials, and the data returned in that specific call.

This is the machine insider problem. Agents hold tokens and credentials like human insiders. They execute actions at machine speed. They move data at 16 times the volume of human users. They are covered by no existing insider risk program, because insider risk programs are built for humans.

The non-human identity security challenge surfaces in every tool call log for every agent running in maker mode today.

A second escalation scenario compounds the risk: agent-to-agent communication. Agent A has limited permissions. Agent A calls Agent B through an MCP tool. Agent B has broad permissions. Agent B returns data that Agent A, and the user behind Agent A, should never reach. No single-platform visibility tool sees this cross-agent chain. Tool call monitoring that captures the full call sequence, including the downstream agent invocation, is the only mechanism that surfaces it.

For teams managing Amazon Bedrock multi-agent architectures, where supervisor agents forward full conversation context to sub-agents, this is a live risk today.

Why Ghost Chasing Configuration Signals Falls Short

Security teams are not ignoring this problem. They are trying to solve it with the tools they have. Those tools consistently fall short.

The pattern looks like this: a security engineer pulls the MCP server configuration for a high-risk agent. The configuration shows the expected connectors, reasonable OAuth scopes, and no obvious misconfigurations. The engineer marks the agent as reviewed and moves on. Three weeks later, an incident surfaces data that the agent should not have accessed.

What happened? The configuration was accurate. The tool calls were not. The agent called a tool with inputs that the configuration never restricted. The tool returned data the scope technically permitted but the workflow never required. No alert fired because no system was watching the calls.

This is ghost chasing: reviewing theoretical risks with no runtime evidence of what actually happened. The problem is not that security engineers are doing poor work. The problem is that configuration signals cannot tell you whether an agent did, what it returned, and whether the invoking identity had any business making that request. Only the tool call record answers those questions.

The toxic risk combinations that create critical-severity alerts are almost always invisible at the configuration level. They become visible only when tool call data is layered against identity context and SaaS entitlements. NIST AI RMF governance controls and OWASP's agentic AI risk guidance both point toward runtime behavioral evidence as a core requirement for AI accountability. Configuration snapshots do not satisfy that requirement.

From Monitoring to Deterministic Guardrails

Monitoring is the prerequisite. It is not the destination.

Security teams managing probabilistic agents need deterministic guardrails: fixed, predictable enforcement rules that do not bend to the agent's next inference step. An agent that decides, based on a user's prompt, to call a tool it was not intended to use is behaving probabilistically. A rule that blocks any tool call outside the agent's approved tool set is deterministic.

The path from monitoring to enforcement runs through the tool call record. You cannot write a guardrail for behavior you have never observed. Tool call monitoring builds the evidence base: which tools are called in normal operation, which inputs are typical, which identity patterns are expected. That evidence base is what makes deterministic rules precise rather than overbroad.

Monitoring MCP tool calls at runtime and building that audit trail is the present capability. Runtime enforcement, the ability to block a tool call before it completes, is on Obsidian's roadmap. Guardrails for Microsoft Copilot are targeted for late Q1 2026. Guardrails for additional platforms are targeted for Q2 2026. Security teams evaluating vendors today should ask any platform claiming to govern AI agents whether it provides per-call visibility. Without that, the platform is governing configuration, not behavior.

For security teams ready to move beyond ghost chasing, the starting point is establishing the tool call audit trail that makes every subsequent security decision evidence-based rather than theoretical.

Frequently Asked Questions

No items found.