Most AI agents running in enterprise SaaS environments hold more permissions than their workflows require. That single finding rates as medium severity. Add an orphaned creator account and a public-facing deployment, and you no longer have a medium-severity finding. You have a critical incident waiting for a trigger. That is the core problem with how most teams score <a href="https://www.obsidiansecurity.com/blog/ai-agent-security-risks">AI agent risk</a> today: they evaluate each factor in isolation and miss the compound severity that combinations create.
Security teams inherit CVSS-style scoring from vulnerability management. That model works for discrete software flaws. A buffer overflow has a severity score. A misconfigured S3 bucket has a severity score. Each finding is evaluated on its own technical merit.
Agentic AI risk does not work that way.
When a security team asks "how risky is this agent," the answer is never a single number derived from a single property. The agent's risk profile is the product of every factor present simultaneously: its access scope, its credential model, its ownership state, its deployment surface, and its connection topology. Remove any one factor and the severity drops. Add one factor and it can jump two severity levels.
This is the difference between additive risk and compound risk. Traditional scoring adds factors: medium plus medium equals medium-high. Compound risk multiplies them: medium times medium equals critical, because each factor enables the next one to cause harm.
Consider a concrete example. An agent configured as org-wide accessible rates as medium severity. An agent running on maker mode credentials rates as medium severity. But an agent that is both org-wide accessible and running on maker mode admin credentials is not a medium-high finding. It is a critical finding, because any user in the organization can now invoke an agent that executes with administrator-level permissions they were never granted. Your IAM controls were bypassed by design.
The question standard scoring tools never ask is: what is the worst combination of factors present on this agent right now? That is the question an AI agent toxic combination framework is built to answer. You can read more about the foundational patterns in Obsidian's prior analysis of toxic risk combinations. This article goes deeper on the scoring methodology and operationalization.
Public accessibility is the single most common risk amplifier in enterprise AI agent deployments. An agent configured as org-wide accessible or publicly reachable via URL is not inherently critical. It becomes critical when combined with any one of three co-occurring factors: maker mode credentials, sensitive data access, or an orphaned owner.
Here is why the AI agent with public access risk pattern is so dangerous at scale.
When an agent is publicly accessible, the invoker population is unbounded. You cannot predict who will interact with it, what they will ask, or what data they will attempt to extract. Standard IAM assumes a known invoker population. Public agents remove that assumption entirely.
Layer maker mode credentials onto public accessibility and you have the confused deputy attack pattern. The agent executes using its creator's permissions, not the invoker's. A user with no Salesforce access invokes the agent. The agent queries Salesforce using the creator's admin credentials. The user receives data they were never authorized to see. The agent did exactly what it was configured to do. Nothing in your IAM flagged it. This is the agentic confused deputy attack: the agent is a trusted deputy that can be directed by anyone to act with authority it should not be delegating.
Consider a composite scenario. A Microsoft Copilot Studio agent is built by a Salesforce administrator to summarize pipeline reports. The creator configures it in maker mode, using their own admin credentials as the connector authentication. The agent is published org-wide for convenience. Six months later, the creator leaves the company. Their account is disabled. The agent keeps running. A contractor with no Salesforce license asks the agent for a full pipeline summary. The agent returns it, authenticated as the former admin. Three toxic combination factors are present: public access, maker mode admin credentials, and an orphaned creator. The individual severity of each factor is medium. The combined severity is critical.
Securing Microsoft Copilot requires detecting this specific pattern, not just flagging public agents in isolation.
Single-factor risk scoring is not wrong. It is incomplete. In agentic AI environments, the critical risks are compound events: two or three medium-severity factors appearing simultaneously on the same agent, each one enabling the next to cause real harm. Standard tooling rates each factor in isolation and produces a list of medium findings that no one prioritizes.
The AI agent toxic combination framework changes the question from "how risky is this factor" to "what is the worst combination of factors present on this agent right now." That question produces a fundamentally different triage list, one where the most urgent items are not the loudest individual alerts but the quietest compound configurations.
Runtime truth is what makes combination scoring possible. Theoretical configuration tells you what factors exist on paper. Effective authority tells you what those factors enable in practice. The gap between those two views is where the most dangerous combinations live.
Inventory your agents. Map their factors. Score their combinations. Start with the public-facing population. That is where the critical combinations are most concentrated, and that is where deterministic guardrails for probabilistic agents will have the most immediate impact.
An AI agent toxic combination is a security condition where two or more individually medium-severity risk factors appear simultaneously on the same agent, compounding to high or critical severity. No single factor triggers a critical alert on its own. The combination of factors creates a viable, often undetected attack path.
Standard scoring frameworks like CVSS evaluate each finding in isolation. They are designed for discrete vulnerabilities with self-contained impact. Agentic AI risk is compound: each factor enables the next one. A scoring model that does not evaluate factors in combination will consistently underrate the most dangerous agent configurations.
The most common critical combination is public or org-wide accessibility combined with maker mode admin credentials. This pattern enables the confused deputy attack: any user can invoke the agent and receive data at the creator's permission level, bypassing all IAM controls. It is very common in Microsoft Copilot Studio and ChatGPT Enterprise deployments.
A confused deputy attack occurs when an agent with elevated permissions is invoked by a user who does not hold those permissions. The agent acts as a trusted deputy, executing the invoker's request using credentials the invoker was never granted. The agent behaves exactly as configured. No access control was technically violated. But the user extracted data they had no right to see.
Start with inventory. You cannot score combinations you cannot see. Use a platform that provides a single pane of glass across your AI agent builders, including Copilot Studio, Agentforce, Bedrock, Vertex, and n8n. Once you have an inventory, map risk factors per agent. Then apply combination scoring rules. The public-facing agent population is the highest-priority starting point because it contains the highest concentration of critical combinations.