Table of Contents
Part 3 of this series on uplifting Model Risk Management (MRM) for deployed and agentic AI. If you have not read the prelude on deployer risk, start there: The AI industry's massive blind spot.
This instalment focuses on agentic AI: systems that can take actions (via tools, write access, or workflow triggers).
Executive Summary
- Agentic AI changes the risk surface from “is the model accurate?” to “what can the system do, and what does it have permission to touch?”
- As autonomy increases, the blast radius is shaped more by tool access and workflow integration than by the model itself.
- Controls that matter early include least privilege, approval gates, observability, immutable logging, and kill-switches.
- Evidence should include a clear tool/permission map, an approval matrix for high-impact actions, a logging and retention plan, a rollback plan, and the minimum you need for incident reconstruction.
A scenario you will recognise
Consider a plausible internal deployment.
A team introduces an AI agent to help operations: triage incoming requests, draft responses, file tickets, and keep systems up to date. It starts as assistive. Then, gradually, it gains tool access: it can create and update tickets, change a record in a system of record, and trigger a workflow.
Nothing catastrophic happens on day one. The risk shows up through compounding: a small misunderstanding leads to an incorrect update, which triggers a follow-on workflow, which creates more work, which reduces trust, which leads teams to build informal workarounds. You get operational friction and a messy incident narrative, even if no single step is "wrong" enough to fail a unit test.
That is the core point: model risk sits inside operational risk, and when systems can act the relevant question is not just model performance, but the wider operational risk surface created by tool access and workflow integration.
In classic MRM, a lot of the energy goes into questions like “is it accurate?” and “is the documentation complete?” With agents, those questions still matter, but they are not the whole story. The risk surface expands to what the system can touch, how quickly it can act, and what evidence you retain when something goes wrong.
Why agents are different
A lot of enterprise AI is still "single-shot": one prediction or one generated response.
Agentic AI is different. An agent can:
- break a goal into steps
- call tools (APIs, search, code execution)
- write intermediate state
- iterate until it decides it is done
As autonomy increases, the risk boundary shifts. The blast radius is defined less by the model and more by what it can touch.
What changes as autonomy increases
Three shifts happen quickly:
- Compounding: small errors can amplify across multiple steps.
- Privilege: write access (even in a “non-critical” system) can create real-world effects.
- Incident clarity: without good logs, you end up with a story you cannot reconstruct.
The autonomy ladder
A practical framing:
- Assistive: the model drafts, humans decide and execute.
- Supervised action: the agent can act, but requires approval for defined steps.
- Bounded autonomy: the agent acts without approval inside tight boundaries, with strong monitoring and fast rollback.
Controls that matter early
Classic MRM ideas still apply, but agentic systems typically need additional operational controls:
- Least privilege (especially write access)
- Explicit approval gates for high-impact actions
- Immutable logging of prompts, tool calls, and outputs
- Circuit breakers and kill-switches (timeouts, rate limits, emergency stop)
- Outcome monitoring (incidents, rework rates, customer impact)
Evidence artefacts to ask for
If you are reviewing an agentic deployment, practical evidence includes:
- a map of tools the agent can call and what permissions each tool grants
- a list of actions that require human approval
- a logging and retention plan (what is captured, where it is stored, who can access it)
- a rollback plan (what happens when behaviour degrades)
If you only do one thing
List the tools and permissions, and agree the approval gates. “The model is accurate” is not a safety argument when the agent can act.
If you want something concrete to start from, the appendix below shows a simple artefact (with an example filled in).
Appendix: A simple agentic AI control artefact (example)
This is a lightweight template you can keep current as part of an evidence pack.
Agentic AI control artefact (example)
Use case: Service desk triage + ticket drafting
System: “Ops Assist” agent
Owner (business): Head of Operations
Owner (delivery/engineering): Platform Engineering Lead
Environment: Internal users only
Scope and boundaries
- Goal: reduce handling time for common requests by drafting responses and creating tickets
- Out of scope / refusal zones: the agent must not send external/customer emails; must not approve refunds; must not modify production configs
- Control boundary: tool permissions + approval gates in the workflow
Tool and permission map
1) Confluence search (read)
- Data: internal runbooks and knowledge base articles
- Notes: retrieved documents must be logged (document IDs/URLs)
2) Jira / ServiceNow (write: create ticket; comment)
- Allowed actions: create a new ticket, add comments, attach draft text
- Disallowed actions: close tickets; change priority above “Medium”; assign to “VIP” queues
3) Identity / access directory (read)
- Allowed actions: look up staff team and on-call roster
- Disallowed actions: change group membership or permissions
Approval gates
- Gate A (mandatory): before creating a ticket in Jira/ServiceNow → user must click “Approve”
- Gate B (mandatory): before any write to a system of record → senior operator approval
- Gate C (mandatory): before escalating priority to “High” → on-call lead approval
Observability and logging
- Capture: prompt, system prompt version, retrieved document IDs, tool calls (inputs + outputs), model identifier/version, timestamps, user ID
- Retention: 90 days for detailed traces; 12 months for aggregate metrics and incidents
Monitoring signals
- Volume of agent actions (drafts, ticket creates)
- Rework rate (draft edited heavily or rejected)
- Incidents and near-misses (incorrect updates, wrong routing, inappropriate content)
- Time-to-rollback (time from detection to disabling the agent)
Kill-switch / rollback
- Feature flag to disable all write-capable tool calls immediately
- Rate limits on tool calls per user per hour
- Safe mode: “draft-only” (no writes; human copy/paste only)
Incident reconstruction
- Minimum evidence required: the full trace for the interaction (inputs → retrieval → tool calls → outputs), plus the approval events that allowed any writes
Review cadence and triggers
- Cadence: monthly review for first 3 months, then quarterly
- Triggers: vendor/model update; tool permission change; workflow change; incident; sustained drift in quality or rework rate
Series: Prelude · Part 1 · Part 2 · Part 3 · Part 4 · Glossary