NVIDIA's CUDA developer ecosystem is the deepest moat in AI compute. The developer base has grown from 1.6M (FY2020) to 4.7M (FY2024) to 5.9M (FY2025) per SEC 10-K filings, with ~6M cited at GTC 2026's CUDA 20th anniversary. The ecosystem encompasses 400+ CUDA-X libraries (NVIDIA claims 900+ domain-specific libraries/models), an installed base of hundreds of millions of CUDA-enabled GPUs, and 33M+ cumulative CUDA Toolkit downloads.
Jensen Huang describes this as a 'flywheel' -- developers create algorithms, algorithms open markets, markets expand the installed base, installed base attracts more developers. The CUDA-X library suite spans AI (cuDNN, TensorRT, NCCL), data science (RAPIDS/cuDF, cuML), HPC (cuBLAS, cuFFT), and emerging domains (cuQuantum, Sionna 6G, cuOpt logistics). RAPIDS alone has 2M+ downloads and 5,000+ GitHub projects. However, the developer growth rate is decelerating (~25% CAGR vs ~50% in early years), and the composition is shifting -- most new developers use high-level PyTorch APIs rather than writing custom CUDA kernels, meaning they could migrate to ROCm/TPU without touching CUDA directly. The critical question is whether the 'CUDA developer' metric overstates lock-in: if 80%+ of them never write CUDA C++ but only use PyTorch (which is increasingly hardware-agnostic), the moat may be narrower than the headline number suggests..
Platform moat narrows at edges but holds at core
CUDA remains the dominant AI development framework with millions of developers. Alternative frameworks like JAX and Triton are growing but haven't yet achieved production parity for most enterprise workloads.
What percentage of the 5.9M 'CUDA developers' actively write custom CUDA C++ kernels vs only using PyTorch/TensorFlow APIs on NVIDIA hardware?