AMD's ROCm software ecosystem has made dramatic progress in 2025-2026, narrowing the CUDA performance gap to 10-30% for compute-intensive workloads while achieving near-parity for inference. ROCm 7.0 (September 2025) delivered up to 3.5x inference and 3x training improvements over ROCm 6. Seven of the top ten model-development companies now run production workloads on AMD Instinct GPUs.
The most significant validation: OpenAI and Meta each signed 6GW multi-year deals to deploy AMD Instinct MI450-based GPUs starting H2 2026, representing a combined 12GW of committed AMD GPU compute. ROCm became a first-class platform in the vLLM ecosystem (December 2025), with CI test pass rates rising from 37% to 93% in two months. AMD invested $8.1B in R&D in 2025 (+25% YoY) and acquired Nod.ai and Untether AI engineering talent to strengthen the software stack. However, CUDA retains meaningful advantages in custom kernel maturity, Flash Attention equivalents, TensorRT-class inference optimization, and the breadth of its 7.5M+ developer ecosystem. The gap is narrowing from 'ROCm doesn't work' to 'ROCm works but CUDA works better' -- a bear case for NVIDIA's platform premium but not yet an existential threat..
Platform moat narrows at edges but holds at core
CUDA remains the dominant AI development framework with millions of developers. Alternative frameworks like JAX and Triton are growing but haven't yet achieved production parity for most enterprise workloads.
Will the OpenAI and Meta 6GW deals actually materialize at full scale, or are they framework agreements with optionality?