Exec Summary

NVIDIA dominates AI through CUDA’s proprietary lock-in, often leaving developers tied to its GPUs. Modular with its Mojo language (a high-performance Python superset) and MAX framework enables the same code to run at peak speed across GPUs from NVIDIA, AMD, Apple silicon, and CPUs. For mid-sized businesses, this provides a pathway to escaping vendor lock-in, slashing costs (up to 60% lower inference costs and 40-70% reduced latency), scaling affordably, and future-proofing AI investments, making elite high-performance computing accessible without massive budgets.

Business Insider: Chris Lattner Interview - December 2025

NVIDIA dominates AI hardware thanks largely to CUDA, its proprietary software that constrains developers to NVIDIA GPUs. As Chris Lattner (Modular CEO and creator of Swift/LLVM) recently told Business Insider, no chip maker has real incentive to build truly portable software, cementing this dominance. Enter Modular: Creating a unified, hardware-agnostic AI stack with Mojo (a blazing-fast Python superset) at its core. Write code once, it runs at peak performance across GPUs from NVIDIA, AMD, Apple silicon, and even CPUs. No rewrites, no compromises.

Let's drill down further into the implications of Chris' interview...

Why this democratises high-performance computing for medium-sized businesses:

  • Slash costs dramatically: Customers like Inworld AI (see their case study) reduced inference costs by ~60% and latency by 40-70% through superior efficiency and the freedom to choose cheaper or more available hardware.
  • Escape vendor lock-in: Avoid being tied to NVIDIA's pricing and supply constraints. Modular already achieves top-tier speeds on the latest AMD MI355X and NVIDIA Blackwell, often beating native software stacks, such as ~50% better performance on AMD's MI355X compared to AMD's own software.
  • Scale affordably: Deploy production-grade AI (with support for a wide range of models, including popular LLMs, PyTorch, and ONNX) without enormous NVIDIA clusters. Smaller containers, faster startups, and multi-cloud portability cut operational overhead and speed up experimentation.
  • Future-proof your investments: As new accelerators launch, specialised hardware like next-generation GPUs or TPUs designed to accelerate AI computations, your Mojo code adapts effortlessly turning AI compute from an expensive bottleneck into a flexible commodity.

For mid-sized teams with ambitious AI plans but constrained budgets, Modular is dramatically lowering the entry barrier to elite performance. Imagine running affordable private inference on your own infrastructure to keep sensitive customer data secure and compliant, perfect for sectors like healthcare or finance where data sovereignty is key. Or deploying custom RAG systems to turbocharge internal knowledge bases, enabling faster decision-making without skyrocketing cloud costs. Build intelligent agents that automate customer support, streamline operations, or even personalise marketing, all on hardware you already own or can readily afford, like AMD GPUs or even CPUs. No more waiting for NVIDIA availability or dealing with hikes in vendor pricing.

Will hardware portability finally loosen CUDA's grip? Mid-sized businesses have plenty to gain. 🚀