From Model Risk to AI Risk Management: Glossary

Posted on 2026-02-27 :: 1539 Words :: Tags: AI Governance, Model Risk Management, Operational Risk, Enterprise AI

If you have not read the prelude on deployer risk, start there: The AI industry's massive blind spot.

This glossary is intentionally governance-focused: how the terms show up in risk, assurance, and audit conversations.

Core AI terms

AI (Artificial Intelligence): A broad term for systems that perform tasks typically associated with human intelligence. In governance, the key question is where the system’s outputs influence decisions or actions.
Inference: Running a model to produce an output for a given input (a “prediction” or “generation”).
Machine learning (ML): A subset of AI where models learn patterns from data rather than being explicitly programmed.
Model: A statistical or algorithmic system that maps inputs to outputs (predictions, classifications, scores, or generated text).

Generative AI and LLM terms

Foundation model: A large, general-purpose model that is pre-trained on broad, usually internet-scale datasets so it can be adapted to many downstream tasks. Adaptation may include prompt design, retrieval (RAG), fine-tuning, instruction tuning, and other alignment/safety training.
- Governance focus: you are often relying on upstream choices you did not make (training data provenance and licensing, capability scope, safety mitigations, and update cadence), so assurance typically emphasises vendor oversight, testing in-context for your use case, monitoring, and exit planning.
Generative AI (gen AI): Models that generate new content (text, images, code). Governance focus: output quality, misuse, and how generated content is consumed downstream.
Hallucination: A common industry term for an output that is fluent but false, fabricated, or unsupported by reliable sources or the provided context.
- Limitations of the term: “hallucination” is anthropomorphic. The system is not “imagining” in a human sense; it is producing a statistically likely completion that can be wrong.
- Operational nuance: in many real deployments, outputs may not be strictly deterministic across runs (e.g. due to sampling settings, system prompts, retrieval context, or vendor model updates). Governance focus: where hallucinations can cause harm in workflows (especially if outputs are auto-consumed).
Large Language Model (LLM): A foundation model trained to predict and generate text (and often other modalities). Common enterprise uses include summarisation, drafting, extraction, and question answering.
Prompt: The input instructions/context provided to an LLM.
System prompt: A high-priority prompt layer intended to define stable behaviour and constraints (e.g. tone, refusal rules, tool-use policy).

Architecture and delivery patterns

Agent / agentic system: A system that can plan and take multi-step actions toward a goal, often using tools. Governance focus shifts from “is it accurate?” to “what can it do, and what evidence do we retain?”
RAG (Retrieval-Augmented Generation): A pattern where an LLM retrieves documents from a knowledge source and uses them as context to generate an answer. Governance focus: data provenance, access control, and traceability.
Third-party / externally hosted model: A model served by a vendor where you do not control training, weights, or sometimes even the exact served version. Governance focus: black-box testing, monitoring, contracts, and exit planning.
Tool use / function calling: When an LLM or agent calls external tools (APIs, databases, code execution). Governance focus: permissions, approvals, logging, and blast radius.

Testing, monitoring, and evidence

Black-box testing: Testing behaviour through inputs/outputs when internals (training data, weights, serving stack) are unavailable.
Canary evaluation: A small, fixed suite of tests/prompts run on a schedule to detect changes in behaviour.
- Governance focus: early warning for drift/regressions, vendor model updates, and integration changes. For externally hosted models, running canaries across different times of day and days of week can also help surface time-variance.
Drift: A change over time that degrades model performance (e.g. input data changes, concept changes, or workflow changes).
Evidence (assurance evidence): Artefacts you can present under challenge to demonstrate control (e.g. test results, monitoring reports, incident logs, change approvals, vendor attestations).
Evaluation set (test set): A fixed set of inputs (and, where possible, expected outputs) used to assess performance. Governance focus: representativeness, versioning, and whether the evaluation reflects the real workflow.
Model card: A structured document describing a model’s intended use, limitations, training data high-levels (where available), evaluation results, and safety considerations. Governance focus: whether it is specific enough to support assurance for your use case.
Monitoring: Ongoing measurement of model behaviour and outcomes in production (quality, safety, incidents, and impact).
Red teaming: Structured adversarial testing to find failure modes (safety, security, misuse, policy bypass) before incidents find them first.
Time-variance (time variability): Observable performance differences across time even when you try to hold conditions fixed (particularly relevant for externally hosted models).
Validation: A pre-deployment (and periodic) assessment that the model is fit for purpose within defined boundaries.

Governance and risk management terms

GRC (Governance, Risk, and Compliance): The processes and systems used to manage obligations, risks, and controls. In AI programmes, GRC is where you make “we have governance” legible as evidence.
Materiality: The practical significance of a use case (impact and likelihood). In practice, materiality drives the depth of validation, monitoring, and independent review.
Model Risk Management (MRM): The governance and control discipline for managing model use through the lifecycle (inventory, tiering, validation, change control, and monitoring).
Non-Financial Risk (NFR): A common umbrella term (especially in financial services) for risk types that are not primarily credit, market, or liquidity risk. In many operating models, operational risk is managed within an NFR framework alongside conduct, compliance, technology/cyber, and related risk domains.
Operational risk: Risk of loss resulting from inadequate or failed internal processes, people and systems, or from external events. In many taxonomies, model risk sits inside operational risk.
Tiering: Classifying use cases into risk tiers (often aligned to materiality) to set minimum control standards.

Practical governance concepts (common in deployed AI)

Admissibility: The allowed inputs, requests, data sources, and actions for a given AI use case. Governance focus: what is in scope vs out of bounds.
Authority layer (delegated authority): A way to describe AI systems whose outputs materially influence (or directly trigger) decisions and actions. Governance focus: who delegated that authority, within what boundaries, and with what oversight.
Compliance theatre: Controls that look reassuring on paper (policies, checklists, boilerplate statements) but are not embedded in delivery and not supported by evidence from the live system.
Control boundary: The point in a workflow where you can reliably enforce controls (permissions, approvals, constraints, monitoring). For deployed AI, the control boundary is often the integration layer rather than the model weights.
Decision custody: Explicit ownership of decision surfaces: what the system is allowed to decide, who is accountable, what must be escalated, what must be refused, and what evidence must be retained.
Deployer risk (deployer reality): The risks created when an organisation integrates AI into real workflows (process design, data flows, decision rights, monitoring, and incident response). This is distinct from builder risk (training frontier models).
Evidence pack: A small, reusable set of governance artefacts kept current for a use case (inventory entry, tiering rationale, controls, validation summary, monitoring results, incidents, and change history).
Refusal zone: A category of requests/decisions the system must not perform (or must route to a human), even if it technically could.

Evidence and traceability concepts

Commit-time evidence (commit-time proof): Evidence generated at build/release time that required controls were applied (e.g. approvals captured, policy gates passed, tests/evaluations run, versioning recorded).
Replayable provenance: The ability to reconstruct a decision after the fact using retained inputs/context (e.g. prompts, retrieved documents, tool calls, configurations, versions, and logs). Governance focus: incident reconstruction, auditability, and defensibility.

Operational controls (especially relevant for agents)

AI control tower (AI governance control plane): A centralised layer that helps organisations operationalise AI governance across use cases (intake, policy enforcement, approvals, monitoring, incident management, reporting, and evidence collection). This may be implemented as a platform capability, not necessarily a single product.
Approval gates: Mandatory checkpoints where a human or policy control must approve an action before it occurs (especially for high-impact actions such as writes to systems of record, customer communications, or financial changes). Governance focus: ensuring the control boundary is enforceable in the workflow, not just documented.
Blast radius: The scope of potential impact if a system fails (customers, financial outcomes, compliance breaches, operational disruption).
Immutable logging: Logs that cannot be altered after the fact (or that are tamper-evident). Governance focus: incident reconstruction and accountability.
Incident reconstruction: The ability to rebuild what happened during an incident (what inputs were received, what the system did, what it was allowed to do, and what it produced), with evidence that holds up under challenge. This is closely related to replayable provenance and (where needed) immutable logging.
Kill-switch / circuit breaker: Mechanisms to stop or constrain a system quickly when behaviour degrades (timeouts, rate limits, manual override).
Least privilege: Restricting system permissions to the minimum required.
Observability: The ability to understand system behaviour in production using logs, metrics, traces, and structured events. For AI systems this typically includes prompt and context capture (e.g. retrieved documents), tool calls, model/version identifiers, and outcome/incident signals.

Series: Prelude · Part 1 · Part 2 · Part 3 · Part 4 · Glossary

Table of Contents