AI compute is undergoing a structural shift from training-dominated to inference-dominated workloads. Deloitte estimates inference accounted for 50% of all AI compute in 2025 (up from 33% in 2023) and will reach 67% in 2026. The inference-optimized chip market grew from $20B+ (2025) to a projected $50B+ (2026).
This shift is both a massive growth opportunity and a structural threat to NVIDIA. On the bull side, inference demand scales exponentially with agentic AI deployment (Jensen Huang: 'compute equals revenues... without tokens there's no way to grow revenues'), and NVIDIA licensed Groq's inference technology (deal terms undisclosed) to integrate LPU inference into its Vera Rubin platform, targeting 35x higher throughput per megawatt. On the bear side, inference workloads are more cost-sensitive and latency-tolerant than training, making them particularly vulnerable to custom ASICs. Google TPU v6e delivered 65% cost savings for Midjourney's inference migration, AWS Trainium claims 30-40% better price-performance, and analysts project NVIDIA's inference share could fall from ~80% to 20-30% by 2028 as ASICs capture 70-75% of production inference. NVIDIA does not disclose its training/inference revenue split, making the true exposure difficult to quantify..
Growth drivers are evidence-backed
Hyperscaler capex, sovereign AI, and the inference shift are all supported by concrete spending commitments and revenue data, not projections alone.
What is NVIDIA's actual training/inference revenue split? NVIDIA does not disclose this — making the true inference exposure unquantifiable from public data.