Security teams have spent two years building controls around individual AI agents.
Most security teams still think about AI agents as discrete, isolated processes. That model is already obsolete.
Modern enterprise AI deployments use multi-agent architectures, where one agent orchestrates others to complete complex, multi-step tasks. A supervisor agent receives a user request, breaks it into subtasks, and delegates each subtask to a specialized sub-agent. The sub-agents may run on entirely different platforms, use different identity contexts, and connect to different SaaS applications. The user who initiated the request has no visibility into what happens after the first handoff.
The supervisor/sub-agent model is the dominant architecture for enterprise agentic workflows today. A supervisor agent, often built in Microsoft Copilot Studio or Amazon Bedrock, holds the orchestration logic. It decides which sub-agents to invoke, what context to pass, and how to assemble the final response. Sub-agents handle execution: querying a CRM, writing to a database, calling an external API, or invoking another agent.
The security problem is structural. The supervisor agent was provisioned with certain permissions. Each sub-agent was provisioned separately, often by a different team, with its own set of credentials and OAuth grants. When the supervisor calls a sub-agent, it passes context. That context frequently includes data the sub-agent was never independently authorized to receive.
The industry is formalizing agent-to-agent communication through emerging protocols. Google's Agent-to-Agent (A2A) protocol defines standard mechanisms for agents to discover each other, exchange tasks, and return results. These protocols enable interoperability across platforms. They also create standardized pathways for context, credentials, and instructions to flow between agents without any human in the loop.
Standardization accelerates adoption. It also standardizes the attack surface. When agents communicate through a defined protocol, the protocol itself becomes a target. A sub-agent that trusts any message arriving via an A2A-compliant channel will execute instructions from any sender that can reach it through that channel.
In multi-agent architectures, one agent's tool connections can bridge to another agent's tool connections, creating chains of tool access that extend far beyond what any single agent's configuration describes. The tools available to an agent are only fully visible at runtime, not from configuration alone. When agents bridge to each other across platforms, the blast radius of any single misconfigured agent expands to include every downstream connection.
Understanding the full AI agent attack surface requires mapping these chains, not just cataloging individual agents.
Security teams ask why their existing controls do not catch multi-agent risks. The answer is that multi-agent architectures produce five structural failures that no single-platform tool was designed to detect.
Context bleed occurs when a supervisor agent forwards its full conversation context to a sub-agent. Amazon Bedrock's supervisor agent pattern, for example, can forward the entire conversation history, including sensitive data from earlier turns, to every sub-agent it invokes. The sub-agent receives information it was never independently authorized to access. Nothing in the platform flags this as a violation because, from the platform's perspective, a legitimate agent made a legitimate call.
Credential forwarding occurs when tokens and OAuth grants travel through agent chains. An agent provisioned with admin-level Salesforce access passes its token downstream to a sub-agent handling a narrow data lookup task. The sub-agent now operates with credentials scoped far beyond its workflow requirements. This is the machine insider risk equivalent of a contractor receiving a master key because the person who hired them had one.
Trust transitivity is the logical consequence of how agents establish trust. If Agent A trusts Agent B, and Agent B trusts Agent C, Agent A implicitly trusts Agent C through the chain. No explicit trust decision was made about Agent C. No security team reviewed that relationship. The trust was inherited through the architecture. This mirrors the confused deputy problem, now operating at the identity layer across SaaS platforms.
Identity loss happens at handoffs. When a supervisor agent invokes a sub-agent, the original invoker's identity, the human who started the workflow, frequently does not propagate. The sub-agent sees the supervisor agent's identity as the caller. Downstream systems cannot determine whether the original request came from an authorized user or from an automated process operating outside any human's awareness.
Audit fragmentation is the operational consequence of all four failures above. Each platform logs its own events. Microsoft logs what Copilot Studio did. Salesforce logs what Agentforce did. Bedrock logs what its agents did. No platform logs the chain. Incident response teams trying to reconstruct what happened must manually correlate logs across systems that use different identity formats, different timestamps, and different event schemas.
FailureMechanismDetection DifficultyContext bleedSupervisor forwards full conversation history to sub-agentHigh: no platform flags authorized-but-excessive data sharingCredential forwardingOAuth tokens and service account credentials passed through chainsHigh: token usage looks legitimate at each hopTrust transitivityImplicit trust inherited through agent-to-agent relationshipsVery High: no explicit trust decision is loggedIdentity lossOriginal invoker identity drops at agent handoffsVery High: sub-agents see calling agent, not original humanAudit fragmentationEach platform logs its own slice; no cross-chain correlationExtreme: requires manual correlation across incompatible log formats
These failures do not require an attacker. They occur in correctly functioning multi-agent systems by design.
The trust chain problem in multi-agent communication security is not analogous to network lateral movement. It is more dangerous, because it operates at the identity layer with legitimate credentials.
Every agent in a chain inherits the trust decisions made by the agents above it. A sub-agent does not independently verify whether the supervisor agent's request is policy-aligned. It executes the instruction because the instruction arrived from a trusted caller. This is the same assumption that makes bearer tokens dangerous: possession implies authorization.
When an enterprise deploys a supervisor agent with broad SaaS access to orchestrate ten specialized sub-agents, each sub-agent operates under the implicit assumption that any instruction from the supervisor is legitimate. A misconfigured upstream agent can exploit that assumption. The sub-agent has no mechanism to distinguish a legitimate orchestration request from a manipulated one.
The blast radius calculation for multi-agent architectures is multiplicative, not additive. A single compromised agent at the supervisor level exposes every sub-agent's capabilities and every data source those sub-agents can reach. A single misconfigured sub-agent with excessive permissions extends that blast radius to data the supervisor was never supposed to access.
The supply chain breach pattern demonstrates this dynamic at scale: a single compromised integration point can propagate access across hundreds of organizations. Multi-agent architectures create the same propagation dynamic inside a single enterprise, across platforms, at machine speed.
The risk amplifies when agents span platforms. A Copilot Studio supervisor invoking a Bedrock sub-agent crosses two identity systems, two permission models, and two audit logs. The effective authority of the combined chain is not the intersection of each agent's permissions. It is the union. Whatever the most permissive agent in the chain can access becomes reachable through the chain.
This is why agentic AI security requires cross-platform visibility. Per-platform tools see their own agent's behavior. They do not see what that agent enables downstream.
Security teams know something is wrong when they try to answer a basic question: what did this agent actually do, and who authorized it?
Every major AI platform generates logs. Microsoft Copilot Studio logs agent invocations. Amazon Bedrock logs model calls. Salesforce Agentforce logs record accesses. Each log is accurate within its own boundary. None of them captures the chain.
Reconstructing a multi-agent workflow from per-platform logs requires correlating events across systems that do not share a common identifier for the chain. The supervisor agent's session ID in Copilot Studio does not appear in Bedrock's logs. The sub-agent's action in Salesforce does not reference the original user who triggered the workflow. Security teams end up ghost chasing: assembling a picture of what might have happened from fragments that do not connect.
The identity problem compounds the log problem. When an agent handoff occurs, the original invoker's identity frequently does not propagate. A security analyst reviewing Salesforce logs sees the agent's service account as the actor. They cannot determine whether a human triggered the workflow, which human triggered it, or whether that human had any authorization to access the data the agent retrieved.
This is the visibility gap that makes agent-to-agent communication security fundamentally different from traditional access control problems. Traditional IAM assumes a human identity is always traceable. Multi-agent architectures break that assumption at every handoff.
The tools enterprises already own, including native platform dashboards, SIEM integrations, and identity governance platforms, were built for human-centric access models. They log individual events. They do not correlate cross-platform agent chains into a single authority map.
The AI agent governance problem is not a data problem. Logs exist. The problem is correlation: connecting the supervisor's invocation, the sub-agent's execution, the credential that was used, the data that was accessed, and the original human identity that started the chain into a single picture of what actually happened.
Solving agent-to-agent communication security requires four operational controls. These controls do not require replacing existing platforms. They require adding a layer that operates across platforms, at runtime, with deterministic rules.
Every agent handoff must carry the original invoker's identity forward. This is not a default behavior in any current multi-agent platform. It requires explicit design. Security teams should require that any agent architecture passing tasks between agents includes a signed identity claim from the original human invoker. Downstream agents must validate that claim before executing instructions.
Without identity propagation, every sub-agent operates as an anonymous executor. With it, every action in the chain is attributable to a specific human identity that can be checked against authorization policies.
Supervisor agents must pass only the context a sub-agent needs to complete its specific task. Full conversation forwarding is the default in many orchestration frameworks. It is the wrong default. Context minimization limits context bleed and reduces the data exposed if a sub-agent is compromised or misconfigured.
This mirrors the principle of least privilege applied to data flow rather than access grants. The least-privilege framework for AI agents applies at the permission layer and at the context layer simultaneously.
A complete audit trail for a multi-agent workflow requires correlating events across every platform the chain touches. This cannot be done manually at scale. It requires a layer that ingests events from each platform and reconstructs the chain using a common identifier, the original session or workflow ID, that persists across handoffs.
Obsidian's Identity Graph provides this correlation layer. By mapping relationships between agents, identities, applications, and actions across platforms, it produces a single authority map showing what each agent in a chain actually did, what data it accessed, and whether the original invoker had authorization for those actions. This is runtime truth, not theoretical configuration.
Probabilistic agents require deterministic guardrails. Each agent in a chain should operate within fixed, enforceable boundaries: specific data sources it can query, specific actions it can take, specific downstream agents it can invoke. These boundaries must be enforced at runtime, not defined in configuration and trusted to hold.
Deterministic guardrails break the trust transitivity problem. If Agent B cannot invoke Agent C regardless of what Agent A instructs, the chain terminates at the boundary. The blast radius of any single compromised or misconfigured agent becomes bounded rather than unlimited.
Security teams building multi-agent governance programs should start with a risk assessment that maps every existing agent chain, identifies cross-platform handoffs, and flags toxic combinations where excessive permissions meet public accessibility or orphaned ownership.
See how Obsidian maps cross-platform agent chains and enforces deterministic boundaries at runtime.