← NVDA/dc_gpu/Inference Shift: From Training-Dominated to Inference-Dominated AI Compute

Inference Shift: From Training-Dominated to Inference-Dominated AI Compute

$20BKey FigureAI compute is undergoing a structural shift from training-dominated to inference

AI compute is undergoing a structural shift from training-dominated to inference-dominated workloads. Deloitte estimates inference accounted for 50% of all AI compute in 2025 (up from 33% in 2023) and will reach 67% in 2026. The inference-optimized chip market grew from $20B+ (2025) to a projected $50B+ (2026).

50%

Deloitte 2026 TMT Predictions: 'More com

Inference workloads accounted for roughly 50% of all AI compute in 2025 (up from...

80%

Computerworld, CES 2026 coverage

Lenovo CEO Yuanqing Yang projected the long-term AI compute split will reach 80%...

$20B

CNBC, Groq press release

NVIDIA-Groq non-exclusive inference licensing agreement (Dec 24 2025, terms undisclosed); Ross + Madra joined NVIDIA; Groq continues independently...

$2.1M

ainewshub.org / FourWeekMBA (secondary s

Midjourney migrated majority of inference fleet from NVIDIA A100/H100 to Google ...

This shift is both a massive growth opportunity and a structural threat to NVIDIA. On the bull side, inference demand scales exponentially with agentic AI deployment (Jensen Huang: 'compute equals revenues... without tokens there's no way to grow revenues'), and NVIDIA licensed Groq's inference technology (deal terms undisclosed) to integrate LPU inference into its Vera Rubin platform, targeting 35x higher throughput per megawatt. On the bear side, inference workloads are more cost-sensitive and latency-tolerant than training, making them particularly vulnerable to custom ASICs. Google TPU v6e delivered 65% cost savings for Midjourney's inference migration, AWS Trainium claims 30-40% better price-performance, and analysts project NVIDIA's inference share could fall from ~80% to 20-30% by 2028 as ASICs capture 70-75% of production inference. NVIDIA does not disclose its training/inference revenue split, making the true exposure difficult to quantify..

Growth drivers are evidence-backed

Hyperscaler capex, sovereign AI, and the inference shift are all supported by concrete spending commitments and revenue data, not projections alone.

The key question

What is NVIDIA's actual training/inference revenue split? NVIDIA does not disclose this — making the true inference exposure unquantifiable from public data.

Open questions

?Will the Groq 3 LPX + Vera Rubin combination actually close the TCO gap with dedicated inference ASICs, or is NVIDIA's 35x throughput/MW claim achievable only for narrow workloads?

?Can enterprise and sovereign AI customers (who lack scale for custom ASICs) sustain NVIDIA's inference revenue even as hyperscalers migrate to custom silicon?

?How quickly will Groq LP40 (co-designed with Feynman) reach production, and will it be competitive with 2028-era Google TPU v8 and AWS Trainium4?

?Does the 80/20 inference-to-training long-term split (Lenovo CEO projection) represent a consensus, or could reasoning/chain-of-thought models shift compute back toward training-like workloads?