Guardrails

Definitional

Runtime Enforcement

What Are Agentic Guardrails? Deterministic Controls for Probabilistic Systems

Agentic guardrails enforce deterministic runtime controls on probabilistic AI agents. Learn the security-first definition, threat model, and the four-pillar foundation.

Obsidian Editorial Team

Security Research

Obsidian Security

May 7, 2026

May 6, 2026

Key Takeaways

Model guardrails (system prompts, RLHF, output filters) govern what a language model says. Agentic guardrails govern what an agent does with tools, credentials, and SaaS data.
Probabilistic agents require deterministic guardrails. A natural-language instruction to an LLM is a suggestion. A runtime policy check on an agent action is enforcement.
Configuration is not reality. Theoretical configuration shows what an agent is set up to do. Runtime truth shows what it actually did.
Effective authority, not posture, is the signal that matters. An agent's real access inside a SaaS app resolves at runtime through maker mode credentials, OAuth grants, and entitlement inheritance.
Toxic combinations, such as a shadow agent with org-wide access using maker mode credentials to reach sensitive data, only surface when multiple risk factors are correlated on a single agent.
90% of AI agents hold excessive privileges, agents move 16x more data than human users, and they are granted 10x more access than their workflows actually need.
OWASP LLM06 (Excessive Agency), OWASP LCNC-SEC, NIST AI RMF, and MITRE ATLAS all name runtime enforcement on autonomous AI systems as a distinct control requirement.
You cannot govern what you cannot see. Inventory is the prerequisite for every other control.

What are Agentic Guardrails? The Security-First Definition

Agentic guardrails are deterministic controls applied to probabilistic systems at runtime. They enforce fixed, predictable rules on AI agents as those agents execute actions inside enterprise applications, independent of what the model "intends" to do.

Four concepts anchor the definition, and each of them corresponds to a specific gap that model-layer guardrails leave open.

Effective Authority, Not Theoretical Configuration

Configuration shows what an agent is set up to do on paper. Effective authority shows what an agent can actually do inside a SaaS application after all entitlements, delegations, and credential inheritance resolve. An agent with a Salesforce connector is the configuration view. That same agent able to query every record in the org because the connector was built in maker mode is the effective authority view. Only runtime correlation produces the second picture, and only the second picture supports enforcement.

Maker Mode

Maker mode is the specific escalation vector that makes identity correlation non-negotiable. In low-code AI platforms like Microsoft Copilot Studio and Salesforce Agentforce, an agent runs with the creator's credentials, not the invoker's. A business analyst with no Salesforce license can invoke an agent built by a Salesforce admin and extract CRM data the analyst has no right to access. Standard IAM never sees the privilege escalation happen, because to IAM it looks like a normal API call from the admin account. Agentic guardrails must correlate the runner's identity with the maker's permissions in real time to flag this misuse.

Toxic Combinations

Individual risk factors on a single agent often rate medium severity. Combined on one agent, they compound into a critical-priority exposure. A shadow agent (unmanaged, unknown to security) that is also configured org-wide with unrestricted access to sensitive data is the classic example. Agentic guardrails cannot be a flat list of every theoretical risk. They must prioritize agents where multiple named risk factors co-occur, because that is where blast radius actually lives.

Runtime Truth

Every intercepted action, every policy decision, and every identity correlation must generate runtime evidence. Security teams need to answer the four questions customers like CNA, Farmers, and Coinbase repeatedly ask: Was it used? Who triggered it? What data was accessed? Did it succeed? NIST AI RMF GOVERN 1.2 codifies this requirement by calling for trustworthy AI characteristics to be integrated into operational policies, procedures, and practices, not just documented at design time.

‍

Most Security Teams Are Chasing the Wrong Definition

CNA security leaders described their situation as "ghost chasing." They had theoretical configuration signals, "this MCP could be exploited," "this connector exists," "this model is enabled," but no evidence of what any agent had actually done. Their existing tools showed posture. They needed runtime truth.

That gap is where the wrong definition of "guardrails" takes hold.

The Prompt-Layer Fallacy

When security teams inherited "guardrails" from the model-safety era, the term meant system prompts that instruct a model to refuse harmful requests, RLHF tuning that shapes behavior during training, or output filters that scan responses for policy violations. These are model-layer controls. They govern what a language model says.

Agents act. They call tools, invoke APIs, read and write SaaS data, chain actions across applications, and execute workflows using inherited credentials. A system prompt that says "do not access financial records" has zero enforcement power over an agent holding an OAuth token with full Salesforce read access. The model might comply. It might not. That is the nature of probabilistic systems.

Why Probabilistic Controls Cannot Govern Probabilistic Agents

A system prompt is a natural-language suggestion interpreted by a probabilistic model. It can be overridden by prompt injection, ignored across multi-turn conversations, or simply misinterpreted. OWASP's LLM06: Excessive Agency identifies exactly this class of risk: agents taking unintended actions because of excessive permissions, excessive functionality, or excessive autonomy. The recommended mitigation is not a better prompt. It is least privilege, scoped tool access, and human approval on high-impact actions, all applied at the agent layer, not the model layer.

The distinction matters operationally. "The model refused to answer a harmful question" is a content outcome. "The agent extracted 50,000 customer records through a maker mode connection nobody knew existed" is a security incident.

‍

Why the Threat Model Is Different for Agents

Security leaders who treat agent risk as a variant of model risk will miss the highest-severity attack vectors. Three realities separate the two.

Agents Are Machine Insiders

AI agents hold tokens and credentials the same way human insiders hold them, but they operate at machine speed and without coverage from any insider-risk program. Legacy identity governance workflows were built for humans moving through sanctioned workflows. They were not built for non-human identities executing thousands of actions per minute using inherited admin credentials.

This is the machine insider risk category. When a human accesses Salesforce, they authenticate, navigate a UI, and pull one record at a time. When an agent accesses Salesforce, it authenticates with an OAuth token, queries the API, and can extract thousands of records in seconds. The credentials are identical. The speed, scale, and blast radius are not. Oscar Health named this their number-one AI security concern for exactly this reason.

Toxic Combinations Only Surface at Runtime

Individual risk factors on a single agent, orphaned ownership, org-wide sharing, maker mode credentials, connection to an unsanctioned MCP server, often rate medium severity. Combined on one agent, they become a critical-priority toxic combination.

Consider a shadow agent: its creator's account has been disabled (orphaned), it is configured as org-wide accessible, it holds maker mode credentials with admin-level SaaS access, and it connects to an MCP server security never sanctioned. No configuration review surfaces this. The orphaned status lives in the identity system. The sharing scope lives in the platform config. The MCP connection is invisible without shadow agent discovery. The maker mode inheritance requires correlating the agent's configuration with the creator's actual entitlements inside each connected SaaS app. Only a unified Identity Graph that maps all of these dimensions together can prioritize the compound risk.

Framework Alignment

OWASP LLM06: Excessive Agency directly addresses agents taking unintended actions from excessive permissions, functionality, or autonomy. OWASP LCNC-SEC covers low-code/no-code security risks, including the account impersonation category that maker mode exploits. NIST AI RMF GOVERN requires policies that define acceptable AI system behavior and mechanisms that enforce those policies in operation. MITRE ATLAS documents adversarial techniques against AI systems, including agent manipulation and tool abuse.

All four name the same requirement: enforcement on autonomous AI systems at the point of action, not at the point of design.

‍

The Named Risk Factors Agentic Guardrails Address

Agentic guardrails are not an abstract concept. They correspond to a specific, documented set of risk factors that appear across platforms like Microsoft Copilot Studio, Salesforce Agentforce, Amazon Bedrock, Google Vertex, Azure AI Foundry, and n8n. Each one is a place where theoretical configuration tells security teams one story and runtime truth tells them another.

‍

Risk Factor	What an Agentic Guardrail Enforces	Why Theoretical Configuration Fails
Maker Mode	Correlates the invoker's identity with the agent's inherited maker credentials, flagging privilege escalation in real time.	Configuration shows the agent has a connector. It does not show that the connector runs on the creator's admin credentials, or that a lower-privileged user just invoked it.
Orphaned Agents	Detects agents whose creator or owner account has been disabled but whose credentials remain active.	Lifecycle events in IAM do not cascade to agents. Posture snapshots miss the agent created between scans or orphaned after one.
Org-Wide Accessible	Flags agents configured as org-wide or publicly accessible when paired with sensitive data access.	Sharing scope and data access live in different systems. Configuration review alone does not connect them.
Confused Deputy	Intercepts agent actions where a lower-privileged user is manipulating an agent to use its elevated permissions on their behalf.	Logs show a normal API call from the agent's credentials. The underlying misuse is only visible when invoker and credential are correlated.
Hardcoded Secrets	Identifies agents with embedded credentials in configuration or code, which break shared-responsibility and enable lateral movement.	Secret scanning tools look at source code repositories. Agent configurations live in platform consoles those tools do not reach.
Unsanctioned Connections	Surfaces agent connections to MCP servers, third-party tools, or domains security never approved.	MCP server inventory shows sanctioned connections. Unsanctioned ones are invisible without shadow agent discovery and runtime correlation.
Shadow Agents	Discovers agents deployed without IT or security oversight, often by business users inside low-code platforms.	You cannot govern what you cannot see. Agents created outside sanctioned channels do not appear in any configuration review.

The Four Pillars That Make Agentic Guardrails Possible

Agentic guardrails are not a single product decision. They depend on a four-pillar strategic foundation. Skip a pillar and the pillars above it collapse.

Pillar 1: Inventory

Before any enforcement conversation, security teams need a continuous, authoritative system of record for every agent: its creator, the SaaS apps it touches, its OAuth scopes, its connected MCP servers (sanctioned and unsanctioned), and its sharing scope. Every customer discovery call begins here because inventory is foundational to control.

The scale problem is real. One enterprise discovered 2,500 agents already created before security was looped in. Another found 377 Copilot agents through an assessment they had not known existed. MCP server counts at some enterprises are doubling quarterly. Without a single pane of glass across Copilot Studio, Salesforce Agentforce, Amazon Bedrock, Google Vertex, Azure AI Foundry, n8n, ChatGPT Enterprise, and the rest of the stack, every subsequent control is unreliable by construction.

Pillar 2: Identify (Blast Radius)

Knowing an agent exists is not the same as knowing what damage it could cause. Blast radius maps the real scope of an agent's authority: not what its configuration says it can access, but what it can actually reach across the organization's SaaS environment via delegated entitlements, OAuth grants, and inherited credentials.

This is where the Identity Graph earns its keep. The graph correlates human identities, agents, applications, MCP servers, and LLMs into a single view, then surfaces toxic combinations for prioritized remediation. Twilio framed this exactly as a blast radius problem. State Street asked for imputed permission modeling based on actual behavior rather than policy. Both were describing the same gap between theoretical configuration and effective authority, and both required the same answer.

Pillar 3: Detect (Deterministic Guardrails at Runtime)

With inventory and blast radius in place, deterministic guardrails become possible. The guardrails execute fixed rules predictably, regardless of what the probabilistic agent attempts. Farmers described this requirement plainly: they needed runtime enforcement, not just logging. Coinbase required a runtime block plus alert. Meta required rule-of-two runtime detection for agent-to-agent interactions. None of these needs can be met by posture signals alone.

The enforcement mechanism matters. Agentic guardrails hook directly into AI platforms via webhooks and native APIs. They do not sit inline as a network proxy or MCP gateway, and they do not require a SaaS connector for every application the agent reaches. The enforcement layer belongs to the security team, not to IT or the SaaS platform admins.

Pillar 4: Prove

Every intercepted action produces evidence. Every policy decision produces a record. Every identity correlation produces an audit trail. Security teams need to answer the boardroom and audit-committee questions that CISOs are being asked directly now: Who invoked the agent? What data did it touch? Was the action authorized? What would the blast radius have been if it had not been intercepted?

Farmers required ticketing integration and SLA alignment. Trace3 asked for templatized board-level reporting. S&P Global asked for alignment to existing security frameworks. All of these are Pillar 4 requirements, and all of them depend on the runtime evidence that the prior three pillars produce.

‍

From Definition to Decision

The definition of agentic guardrails determines where security teams invest. If "guardrails" means system prompts and output filters, the budget goes to the model layer and the agent layer stays ungoverned. If "guardrails" means deterministic, runtime enforcement across identity, permissions, tool calls, and data boundaries, the budget follows the risk.

The four-pillar foundation is sequential. Inventory first, then blast radius, then detection and enforcement, then audit-ready proof. Skipping to enforcement without inventory is building walls without knowing where the doors are.

Security leaders evaluating agentic guardrails should pressure-test any vendor against four questions:

Do you show effective authority or theoretical configuration? Configuration-only tools cannot see maker mode escalation, orphaned credential inheritance, or runtime toxic combinations.
Do you require a SaaS connector for every application? If yes, security teams depend on IT for every deployment, and coverage gaps persist wherever connectors are missing.
Can you map the full identity chain from invoker to agent to credential to application entitlement? Without that chain, maker mode escalation and confused deputy attacks are invisible.
Do you enforce at runtime or report after the fact? Detection without prevention is expensive logging.

The agents are already running. The question is whether security teams are working from runtime truth or still ghost chasing theoretical configuration. Start with the AI agent risk assessment to see what is actually in the environment, then walk the sequence through the CISO Playbook for Securing AI Agents.

Frequently Asked Questions

What is the difference between agentic guardrails and model guardrails?

Model guardrails (system prompts, RLHF, output filters) govern what a language model says. Agentic guardrails govern what an agent does: which tools it calls, what data it accesses, whose credentials it uses, and whether each action is authorized. Model guardrails are probabilistic. Agentic guardrails are deterministic.

Why can't I just use system prompts to control agent behavior?

System prompts are natural-language instructions interpreted by a probabilistic model. They can be overridden by prompt injection, ignored across multi-turn conversations, or misinterpreted. They have no enforcement power over tool calls, API access, or credential usage.

What is maker mode and why does it matter for agentic guardrails?

Maker mode is a configuration in platforms like Microsoft Copilot Studio and Salesforce Agentforce where an agent executes with the creator's credentials, not the invoker's. Any user who invokes the agent inherits the creator's privilege level, bypassing IAM entirely. Agentic guardrails must detect and enforce against this specific escalation vector by correlating the invoker identity with the agent's inherited credentials in real time.

What are toxic combinations in the context of AI agent security?

Toxic combinations occur when multiple risk factors compound on a single agent, such as a shadow agent with org-wide access, maker mode credentials, and a connection to an unsanctioned MCP server. Each factor alone may be medium risk. Together, they create a critical-priority exposure that only surfaces when the Identity Graph correlates all dimensions at once.

Do agentic guardrails require SaaS connectors?

No. Agentic guardrails hook directly into AI platforms via webhooks and native APIs, allowing security teams to deploy and operate them without requiring IT or SaaS admin involvement for every application.

What is the relationship between agentic guardrails and Zero Trust?

AI agents are non-human identities. Zero Trust principles (verify explicitly, least privilege, assume breach) extend directly to them, but only if the control layer actually sees the agent's effective authority. Agentic guardrails operationalize Zero Trust for machine identities that traditional IAM programs do not cover.

How do orphaned AI agents create security risk?

When an agent creator's account is disabled (employee leaves, role changes), the agent continues running with the creator's inherited credentials. No lifecycle event triggers a review. The agent becomes an orphaned non-human identity with potentially admin-level access and no owner, and it often appears in toxic combinations once correlated with other risk factors.

What frameworks support the case for agentic guardrails?

OWASP LLM06 (Excessive Agency), OWASP LCNC-SEC, NIST AI RMF, ISO/IEC 42001, and MITRE ATLAS all address the need for enforcement on autonomous AI systems at the point of action, not only at the point of design.

What is shadow AI and how does it relate to agentic guardrails?

Shadow AI refers to AI tools and agents deployed without IT or security oversight. Shadow agents are more dangerous than shadow apps because agents take autonomous actions with inherited credentials. Agentic guardrails require shadow AI detection as a prerequisite: you cannot enforce rules on agents you do not know exist.