Table of Contents
If you have not read the prelude, start there: The AI industry's massive blind spot.
Executive Summary
In short: most organisations are not building frontier models. They are deploying third-party AI into workflows, and the day-to-day risk is dominated by integration, vendor dependency, drift, and (increasingly) autonomy.
If you are a CRO (or you report to one), this is where the real governance gap tends to appear. Not because AI is uniquely mysterious, but because many deployments skip disciplines that already exist for managing models.
This post is the cover article for a short series addressing a pragmatic question:
If you already have a well-functioning Model Risk Management (MRM) framework, what do you need to uplift to scale AI safely (including vendor models and agents) without slowing delivery?
What this series is (and is not)
This series is not a technical deep dive on how models are trained. It is about deployed AI in the real world: integration, vendor dependency, drift, autonomy, and evidence.
It covers both:
- organisations building AI models in-house (training, fine-tuning, or building end-to-end model pipelines), and
- organisations integrating third-party models into workflows.
It does not focus on the use of off-the-shelf AI product features (e.g. Microsoft Copilot). There are real risks to manage there, but they tend to be governed through enterprise controls (data classification, access management, acceptable use, and monitoring) rather than model validation. As a side note, the ubiquitous "AI can make mistakes" and similar warnings are a caveat, not a control!
Where AI comes from matters (delivery archetypes)
In practice, the evidence you need depends on how the capability is delivered:
- In-house developed AI
You own the lifecycle. You need stronger evidence on data provenance, training and evaluation pipelines, reproducibility, and release management.
- Third-party model integration
You often cannot see the weights or training data. Governance shifts toward black-box testing, third-party oversight, change notice expectations, and exit plans.
- Agentic systems (tool-using autonomy)
The control boundary becomes permissions, approval gates, observability, and kill-switches.
This series is organised around these differences.
The core argument
If you are accountable for operational risk, more specifically model risk, the question is not "Are we using AI?" It is:
- where is AI embedded in our processes?
- what is the blast radius when it fails?
- what evidence do we have that it is controlled?
In many risk taxonomies, model risk is treated as a subset of operational risk. This series takes that view: AI pushes you to manage model risk with stronger operational controls and evidence, especially as autonomy and integration complexity increase.
Why MRM is the right starting point
Even imperfect MRM programmes already have the right instincts:
- inventory and ownership
- tiering by materiality
- independent review and challenge
- documented validation
- change control and periodic review
- ongoing monitoring
Those patterns map well to AI. Where organisations struggle is not the idea of governance; it is the operational reality of modern AI.
What typically needs to be uplifted for AI
AI is still "a model", but several aspects usually need strengthening:
-
In-house evidence and engineering discipline If you build models in-house, you typically need to uplift evidence around data provenance, evaluation, reproducibility, and model / pipeline change control (not just a one-off validation report).
-
Vendor and third-party reality Many AI systems are vendor-provided. Evidence looks different: black-box testing, contractual obligations, change notices, and exit plans.
-
Drift and degradation as the default AI can degrade quietly. Monitoring needs to be more continuous and more outcome-focused. This includes the reality that some externally hosted LLM performance can vary over time even under controlled conditions (e.g. daily and weekly periodicity), which reinforces the need for ongoing outcome monitoring rather than one-off validation.
Part 4 goes deeper on what evidence should look like for externally hosted models (including a short research-note appendix on time-variance). See Part 4.
-
Integration is where most failures happen The riskiest part is often not the model weights; it is how the model is embedded into an end-to-end workflow.
-
Ethics needs to be explicit Traditional MRM can be silent on ethics. AI governance should include an explicit ethical assessment: is it legal, is it safe, and is it wise? (See Part 2.)
-
Agents and autonomy change the control surface As tool-using agents become more common, the control boundary becomes permissions, approval gates, observability, and kill-switches. (See Part 3.)
A practical next step
If you want to build capability without slowing the business, pick one meaningful but lower-risk use case and run it end-to-end as a governance walk-through.
A simple way to do that is to take one use case and answer four questions (each one is the focus of a post in this series):
- Is it ethical, not just compliant? Use the “legal, safe, wise” lens (see Part 2).
- Where is the real control boundary? If there is tool access or autonomy, treat it as operational risk (see Part 3).
- What evidence will you rely on? Especially for externally hosted models, design monitoring that treats drift and time-variance as normal (see Part 4).
- Can you tell the story clearly? Turn the above into a small, reusable evidence pack (inventory → tiering → controls → monitoring → artefacts).
Series: Prelude · Part 1 · Part 2 · Part 3 · Part 4 · Glossary