Posted on :: 3488 Words :: Tags: , , ,

This post is a prelude to a short series on uplifting Model Risk Management (MRM) for deployed and agentic AI. If you want the practical governance playbook, start with Part 1.

Executive summary

Most organisations are not building frontier models. They are deploying AI into real workflows, often through third-party vendors. That is where governance tends to break.

Key points:

  • The blind spot is everyday deployment (the deployer reality): how AI is integrated into workflows, and what controls sit around it, matters more day-to-day than headlines about cutting-edge AI and distant, hypothetical risks.
  • AI acts like an authority layer: it can reshape decision loops and amplify weak criteria at scale.
  • Most failures are integration failures: gaps in ownership, decision custody, monitoring, and rollback show up in production.
  • FOMO accelerates risk: decisions move faster than authority structures, leaving engineering to “own” choices it cannot safely own.
  • The practical response is capability, not compliance theatre: treat AI as a core capability you operate and govern, make decision custody explicit, and monitor outcomes.

Why this matters in 2026

The AI landscape in 2026 is dominated by speculation about frontier models, existential risks, and sweeping regulations. Discussions focus on alignment challenges, superintelligent systems, and the need for global oversight. Yet for the vast majority of organisations, those deploying AI in real-world workflows, these conversations feel distant. The real challenges lie in the practical realities of integration, where probabilistic decision engines are embedded into deterministic systems without the necessary infrastructure, governance, or organisational maturity.

First, a current (late February 2026) example of the problem:

Example: Deployer risk in the wild (Woolworths / Olive, Australia)

Woolworths reportedly had to rein in a customer-facing chatbot after it behaved oddly (including claiming to have an “angry mother”) and gave incorrect information.

Woolworths forced to rein in chatbot that claimed to have angry mother

The Olive page states:

“Olive is powered by AI and may make mistakes. Always check the product label.”

That sort of caveat might be legally prudent, but it is not a control. If you deploy AI into customer workflows, you still need clear scope and decision custody, bounded behaviour, monitoring, and a way to roll back quickly when the system behaves unexpectedly.

Let’s get into it.

A recent LinkedIn post highlighted this disconnect, pointing out that the industry's attention is skewed toward high-risk frontier models while everyday deployment risks in non-frontier use cases remain severely under-addressed.

The pattern is clear: FOMO-driven adoption (fear of missing out) prioritises speed over safety, leading to decisions that outpace authority structures. Systems appear innovative on paper but unravel in production, amplifying internal weaknesses at scale. AI is not merely a tool category, it is a new operational authority layer that reshapes decision loops and scales existing criteria weaknesses. When those criteria are weak, AI amplifies them, potentially turning small gaps into systemic failures.

The Core Blind Spot: Deployer Risks Over Frontier Fears

The governance conversation remains heavily tilted toward builders of high-risk models and regulators, leaving deployers (the overwhelming majority of organisations) underserved. Public discourse often dwells on speculative "what if" scenarios, while the immediate challenge is the practical "how-to" of responsible integration. Most failures are integration failures, not because AI is too advanced, but because governance is not designed into the execution path.

Organisations commonly treat AI as:

  • A compliance checkbox (leading to performative policies that look good but fail in practice).
  • A plug-and-play software module (ignoring its probabilistic and stateful nature).
  • A vendor-delivered feature (outsourcing judgement without sufficient due diligence).
  • A strategic buzzword (resulting in rapid tool stacking without strategy).

In reality, AI tends to amplify how an organisation makes decisions, and it can become an operational authority layer that reshapes decision criteria and scales internal weaknesses. A key characteristic is its stateful nature: unlike traditional software that often processes tasks independently (stateless), AI can retain context from prior interactions or data, allowing behaviour to evolve over time.

Think of it like a conversation partner that remembers what happened earlier: what you ask (or how it responds) in one step can influence what happens next. That “memory” creates a new governance problem: behaviour can drift over time without an obvious deploy event, so you need explicit boundaries and ongoing monitoring to keep decisions stable.

This misframing creates interconnected gaps: literacy (technical and organisational understanding), dependency (over-reliance on vendors), and management mismatch (applying deterministic IT practices to probabilistic systems). The discussion reveals a clear causal chain from upstream omissions to runtime risks, accelerated by FOMO and unowned decisions moving faster than authority structures.

The Causal Chain: From Upstream Gaps to Runtime Failures

Failures typically follow a predictable sequence:

  1. Upstream Decision Gaps: Before integration begins, organisations often fail to define what must be explainable, measurable, or reversible. This creates downstream problems where engineering inherits responsibility for decisions it was never equipped to own.

  2. Decision-Custody Gaps: At procurement and design stages, explicit custody over decisions is missing: what the system is allowed to decide, who owns each decision surface, where refusal is mandatory, and what must be replayable for audit purposes.

  3. Execution Without Bound Authority: In production, seemingly minor technical choices become de facto authorisation decisions, extending attack surfaces without defined admissibility, explicit authority, or commit-time proof.

  4. Architectural Mismatch: Applying deterministic IT frameworks to probabilistic systems creates fundamental fractures: legally irresponsible mismatches that invite liability exposure.

  5. Missing Middle Capability: The gap between readily available tools and mature operational muscle: no monitoring strategy, no deep vendor evaluation, no fallback processes and no real ownership of behaviour which is exacerbated by FOMO prioritising speed over guardrails.

  6. Downstream Consequences: Remediation cycles, liability exposure, eroded sovereignty, and performance erosion from low-quality outputs or cognitive overload.

FOMO acts as an accelerant, treating controls as afterthoughts and leaving systems vulnerable until incidents force remediation.

The Prescription: Build Disciplined Capability as Infrastructure

The overwhelming prescription is to shift from compliance theatre to engineering discipline. Treat AI as a core capability you operate and govern, not a set of one-off features you bolt on. Key elements include:

  • Explicit institutional reasoning captured upfront, making reasoning explicit, portable, and resilient to drift.
  • Decision infrastructure and custody: defining admissibility, scope, ownership, refusal zones, commit-time evidence, and replayable provenance.
  • Runtime decision governance: monitoring decision integrity (authority allocation, escalation paths, rollback semantics) beyond model metrics.
  • Technical vendor scrutiny: moving to engineering reviews over marketing or terms-and-conditions checks.
  • Organisational capability shift: placing AI cross-functionally with middle-management literacy and unified strategies.
  • Aligned triad: education (literacy and buy-in) + operational governance (non-theatrical control) + value delivery.
  • Sovereignty design: retaining internal override and audit capability to prevent outsourcing of judgement.

Tools such as AI control towers provide centralised enforcement of workflows, monitoring, and compliance before customer impact.

Competitive advantage lies in disciplined execution: organisations that close the missing middle and build this infrastructure will iterate faster, remediate less, and earn greater trust, while laggards remain in remediation cycles and liability firefighting.

How Organisations Are Approaching This in 2026

From broader trends:

  • High-risk sectors (finance, healthcare) lead with decision-custody checklists and legal sign-offs on authorisation scopes.
  • Platform adoption is rising: centralised governance planes enforce runtime integrity, commit-time evidence, and vendor SLAs.
  • Stress testing: "regulator-free" simulations reveal gaps in trust and resilience.
  • Capability metrics: tracking percentage of decisions with explicit custody, time-to-rollback, and authority-drift detection.

The final takeaway: AI is not a productivity hack. It is an authority layer that demands disciplined capability. This shift is not about avoiding disasters; it is about creating durable, compounding advantage in the deployer era.

How is your organisation bridging compliance to capability? Are you defining decision custody upfront, or still treating AI as "just software"?


The appendices below provide optional detail if you want to go deeper:

  • Appendix 1 (key themes): a more detailed elaboration of the main themes and their practical implications.
  • Appendix 2 (MRM vs AI risk): what carries over from classic model risk management, and what changes for modern AI.
  • Appendix 3 (transition checklist): a practical checklist for shifting from point-in-time model validation to ongoing AI risk management.
  • Appendix 4 (emissions): how AI emissions move from an ethical concern into operational and economic reality.

Appendix: Detailed Elaboration of Key Themes

The main article outlines the core themes emerging from the discussion. This appendix provides deeper explanations and practical implications for each, serving as a reference for readers who wish to explore the ideas in more detail.

ThemeDetailed ElaborationPractical Implications
Upstream Decision GapsOrganisations fail to make foundational choices pre-integration: what must be explainable, measurable, reversible, admissible, or non-delegable. This cascades into technical problems downstream.Require formal authorisation reviews with legal/engineering sign-off before procurement. Conduct pre-integration workshops to define non-negotiables.
Decision Infrastructure & Custody GapsMissing explicit custody over decisions: scope, ownership, refusal zones, replayable provenance, admissibility, explicit authority, commit-time proof. Leads to unowned decisions outpacing authority structures.Adopt checklists for every decision surface. Implement chain-of-custody frameworks for enforceable chains. Use platforms for commit-time evidence generation.
Operational Risks in Deployment and IntegrationHazards from embedding probabilistic AI in deterministic workflows: drift, uncertainty, degradation, FOMO bypassing guardrails, risks at integration seams rather than innovation.Build centralised intake processes, tiered risk evaluations, and semantic rollback. Run "regulator-free" stress tests to expose vulnerabilities.
Governance, Accountability, and Architectural/Structural ControlNeed for proactive architectures: control layers for regimes/escalation/adaptation, drift detection/uncertainty signalling/incentive alignment/override authority, explicit reasoning capture, enforceable custody, runtime decision integrity monitoring.Embed calibration layers and control surfaces. Shift to cross-functional teams with middle-management literacy.
Economic and Resource Constraints as Root CauseShort-term costs/FOMO under-invest in capability, amplifying dependency/exposure. Outsourcing does not absolve due diligence.Budget for upskilling and technical vendor reviews. Measure ROI of governance as reduced remediation costs.
Strategic/Organisational Maturity and Experimentation RisksEarly experimentation lacks strategy, leading to sprawl/slop/strain. Maturity requires unified approaches and capability shifts.Develop AI strategies with middle-management input. Avoid IT-siloed placement; focus on cross-functional maturity.
Emerging Sovereignty Gap and Decision DependencyLacking internal muscle outsources judgement/authority to vendors/models.Design sovereignty with explicit boundaries and override rights. Use technical gates to prevent platform lock-in.
Holistic Post-Prototype Discipline and SustainabilityReal work starts after prototypes: ownership, monitoring, governance, accountability, people-process-risk balance, upskilling, long-term viability.Align the triad (education + governance + value). Measure capability metrics like decision-traceability time. Leverage tools for centralised enforcement.

Appendix: Similarities and Differences Between Managing Model Risk and AI (Model) Risk

The deployer risks discussed above build on traditional model risk management principles, but AI introduces new dynamics. This appendix compares the two approaches to help organisations adapt existing practices effectively.

Model risk management (MRM) is a traditional practice in fields like finance, focusing on risks from statistical models (e.g., credit scoring, forecasting). AI model risk management extends this but adapts to AI's unique characteristics, such as machine learning, neural networks, and generative systems. Below is a comparison based on established frameworks (e.g., ISO 42001, NIST AI RMF).

Similarities

  • Risk identification and tiering: both assess model complexity, materiality, and use-case risks using tiered approaches (low/medium/high risk) to prioritise validation and monitoring.
  • Governance principles: shared foundations include oversight, independent validation, documentation, and audit trails, with emphasis on data quality, bias mitigation, overfitting avoidance, and interpretability challenges.
  • Monitoring and control: ongoing performance tracking (e.g., accuracy, drift) and remediation are common, with contingency plans like fallback processes and rollback mechanisms.
  • Data handling: risks from poor data quality, bias, or incompleteness are central, with requirements for provenance and ethical sourcing.
  • Regulatory overlap: frameworks like Basel Committee guidelines for traditional models inform AI standards (e.g., EU AI Act), with shared focus on accountability and transparency.

Differences

  • Data and complexity: traditional models use historical, structured data and linear relationships; AI handles real-time, unstructured/diverse data, non-linear patterns, and autonomous learning, introducing dynamic risks like hallucinations, scale failures, or emergent behaviours.
  • Risk nature: traditional is static/model-centric (e.g., overfitting in statistical models); AI is dynamic/use-case-centric (e.g., drift in black-box models, ethical/privacy issues from third-party pretraining).
  • Validation and interpretability: traditional allows clearer logic inspection; AI's "black box" nature requires recalibrated methods (e.g., SHAP/LIME for explainability) and ongoing retraining.
  • Speed and scalability: traditional is manual/reactive/slow; AI enables proactive, automated, fast analysis but demands real-time monitoring to handle complexity.
  • Scope and oversight: traditional focuses on in-house models; AI often involves vendor-pretrained systems, requiring new boundaries for ownership, geopolitical risks (sovereignty), and integration into broader systems.
  • Regulatory evolution: traditional MRM is mature (e.g., SR 11-7 guidance); AI risk management adapts it but adds AI-specific elements like ethical AI, bias in training data, and human-in-the-loop requirements.

In summary, traditional MRM provides a strong base for AI risk management, but AI demands more dynamic, proactive, and use-case-focused approaches to address its scale and opacity. Organisations transitioning should recalibrate frameworks like ISO 42001 for AI's unique risks while retaining core MRM principles.


Appendix: Checklist for Transitioning from Model Validation to Managing AI Risks

For risk professionals already performing model validation, the shift to AI risk management requires targeted adjustments. This checklist provides a practical starting point aligned with the capability-building recommendations in the main discussion.

If you are accustomed to traditional model validation (e.g., in finance or risk management), shifting to AI risk review requires expanding your scope to address AI's dynamic, probabilistic nature. Below is a prioritised checklist to start with, focusing on high-impact areas. Begin with high-risk use cases and iterate.

  1. Assess AI-Specific Risks
    Extend beyond overfitting/underfitting to include hallucinations, emergent behaviours, bias amplification, and adversarial attacks.
    Priority action: Review training data for ethical sourcing and diversity.

  2. Adapt Validation to Dynamic Models
    Traditional validation is static; AI needs continuous retraining checks.
    Priority action: Implement runtime drift detection and bias regression testing in production.

  3. Incorporate Ethical and Sovereignty Reviews
    Add layers for fairness, privacy, and data sovereignty (e.g., no vendor use of customer data for training).
    Priority action: Define explicit decision boundaries and refusal zones upfront.

  4. Build Runtime Monitoring and Custody
    Monitor decision integrity (authority, escalation, rollback) beyond metrics.
    Priority action: Require commit-time proof and replayable provenance for audits.

  5. Enhance Vendor and Procurement Scrutiny
    Traditional models are often in-house; AI involves third-party pretrained systems.
    Priority action: Conduct technical engineering reviews of APIs for stateless processing, zero retention, and explainability provisions.

  6. Integrate Human-in-the-Loop and Override Mechanisms
    Ensure human authority overrides AI where needed.
    Priority action: Map high-stakes decisions to mandatory review gates.

  7. Leverage Standards and Tools
    Use ISO 42001 or NIST AI RMF operationally (not as checkboxes).
    Priority action: Adopt centralised platforms for governance enforcement and real-time audits.

  8. Foster Cross-Functional Ownership
    Move from siloed validation to organisational capability.
    Priority action: Train middle management on AI literacy and integrate legal/compliance early.

  9. Stress Test for Resilience
    Simulate "regulator-free" scenarios.
    Priority action: Ensure systems remain trustworthy without external oversight.

  10. Document and Iterate
    Maintain living risk registers.
    Priority action: Track capability metrics like time-to-rollback and % of decisions with explicit custody.

This checklist builds on MRM foundations while addressing AI's unique demands. Start with a pilot on one workflow, then scale.


Appendix: Environmental Emissions from AI Inference – From Ethical Concern to Operational & Economic Reality

While the core blind spot is the lack of decision custody and runtime governance, the environmental footprint of AI inference represents an emerging parallel risk that compounds the resource constraints and sustainability themes. This appendix examines how emissions transition from ethical concern to operational and economic reality.

Emissions from AI inference (the process of running trained models to generate outputs, such as queries to chatbots or image generators) and related activities represent a growing concern in the broader picture of AI deployment risks. They fit into the discussion not only as ethical considerations but also as emerging operational, regulatory, and economic factors that organisations must address for sustainable integration.

In the context of deployer risks, emissions from AI operations (like inference, training, and data centre cooling) highlight the environmental sustainability challenges of scaling AI into everyday workflows. These are not abstract "frontier" issues but practical ones that arise post-prototype, during ongoing use. Emissions are integral to the deployer blind spot, manifesting as a sustainability risk that intersects with operational, economic, and governance themes. They are not peripheral; they compound the cascade from upstream gaps to downstream failures.

Emissions from AI inference have traditionally been framed as ethical considerations, part of broader responsible AI principles like environmental sustainability, fairness, and societal impact. This includes moral imperatives, as AI's carbon footprint contributes to climate change, with inference per query emitting 0.03-1.14 grams of CO2e on average (comparable to a fraction of a lightbulb's use). At scale, this raises equity issues (e.g., data centres in water-stressed regions exacerbate local environmental burdens). Many organisations view emissions as part of ESG (environmental, social, governance) commitments, driven by stakeholder pressure rather than mandates.

However, this is rapidly evolving beyond ethics into tangible costs and regulatory requirements. Energy consumption already translates to direct operational expenses (e.g., AI could account for over half of data centre power by 2028), with costs rising as electricity demands surge. For external models, users pay not just API fees but implicit environmental premiums through higher cloud pricing tied to energy-intensive inference. Using external models (e.g., via cloud APIs like those from major providers) adds Scope 3 emissions (indirect from suppliers), which organisations must track and mitigate. Optimisation (e.g., efficient hardware or mixed-quality models) can yield 40-60% cost reductions, turning emissions management into a financial lever.

Regulatory and compliance costs are also emerging: governments are introducing emissions reporting mandates, shifting ethics to enforceable obligations. The EU's Corporate Sustainability Reporting Directive (CSRD) requires Scope 3 emissions disclosure (including AI-related) for thousands of companies. California's SB 253 mandates Scope 3 reporting starting 2027 for large firms. Globally, carbon pricing mechanisms (e.g., taxes or cap-and-trade) are expanding to tech infrastructure. Projections suggest AI-specific regulations by 2026–2027, such as mandatory emissions tracking under the EU AI Act expansions or US federal guidelines. Non-compliance could mean fines, while proactive reporting adds administrative costs.

Looking forward, emissions are likely to incur explicit penalties. Carbon taxes on data centres (already discussed in policy circles) could add 10-20% to inference costs in high-emission regions. Providers may pass on "carbon premiums" for sustainable options, similar to green energy surcharges. For external models, contracts might include emissions-linked SLAs, with penalties for exceeding thresholds, turning ethical concerns into contractual liabilities.

As AI inference scales (projected to drive 80%+ of AI electricity use), emissions will increasingly factor into total cost of ownership. Organisations that treat AI as infrastructure, embedding efficiency, monitoring, and low-emission designs, will gain an edge, reducing both ethical footprints and financial burdens. Those ignoring it risk regulatory fines, higher energy bills, and reputational hits. Standards like ISO 42001 and NIST AI RMF are evolving to include emissions, suggesting mandatory integration soon.


Series: Prelude · Part 1 · Part 2 · Part 3 · Part 4 · Glossary