← NVDA/platform_premium/AMD ROCm Progress: Narrowing the CUDA Gap

AMD ROCm Progress: Narrowing the CUDA Gap

+25%YoY Growth+25% YoY

AMD's ROCm software ecosystem has made dramatic progress in 2025-2026, narrowing the CUDA performance gap to 10-30% for compute-intensive workloads while achieving near-parity for inference. ROCm 7.0 (September 2025) delivered up to 3.5x inference and 3x training improvements over ROCm 6. Seven of the top ten model-development companies now run production workloads on AMD Instinct GPUs.

37%

ROCm Blogs - vLLM Omni

ROCm became first-class platform in vLLM ecosystem: dedicated ROCm CI pipeline w...

30%

ThunderCompute / AIMultiple comparative

CUDA typically outperforms ROCm by 10-30% in compute-intensive workloads; Flash ...

$8.1B

MacroTrends / AMD corporate announcement

AMD R&D spending reached $8.1B in 2025, up 25.3% year-over-year, with acquisitio...

The most significant validation: OpenAI and Meta each signed 6GW multi-year deals to deploy AMD Instinct MI450-based GPUs starting H2 2026, representing a combined 12GW of committed AMD GPU compute. ROCm became a first-class platform in the vLLM ecosystem (December 2025), with CI test pass rates rising from 37% to 93% in two months. AMD invested $8.1B in R&D in 2025 (+25% YoY) and acquired Nod.ai and Untether AI engineering talent to strengthen the software stack. However, CUDA retains meaningful advantages in custom kernel maturity, Flash Attention equivalents, TensorRT-class inference optimization, and the breadth of its 7.5M+ developer ecosystem. The gap is narrowing from 'ROCm doesn't work' to 'ROCm works but CUDA works better' -- a bear case for NVIDIA's platform premium but not yet an existential threat..

Platform moat narrows at edges but holds at core

CUDA remains the dominant AI development framework with millions of developers. Alternative frameworks like JAX and Triton are growing but haven't yet achieved production parity for most enterprise workloads.

The key question

Will the OpenAI and Meta 6GW deals actually materialize at full scale, or are they framework agreements with optionality?

Open questions

?Can ROCm close the remaining 10-30% compute-intensive training gap by MI450 launch in H2 2026?

?What percentage of workloads at hyperscalers are inference (where ROCm is near-parity) vs training (where CUDA still leads)?

?How will NVIDIA's CUDA Tile IR open-sourcing and Triton backend integration affect ROCm's competitive positioning?