Nvidia Bets on $1T Inference Boom at GTC 2026

The AI hardware arms race shifted into a far larger gear this week. At the opening of GTC 2026, Nvidia CEO Jensen Huang projected that total revenue from the company's AI chips — spanning both training and inference workloads — could exceed $1 trillion through 2027. That figure roughly doubles the $500 billion opportunity Nvidia cited on its February 2026 earnings call.

The number is staggering, but the rationale behind it is more significant than the headline. Nvidia is no longer positioning itself primarily as a training-infrastructure company. The next phase, according to Huang, is inference — the compute-intensive process of running trained models in real time at global scale.

Why Inference Is the New Battleground

Training a large language model is expensive. Running one at scale, continuously, across millions of concurrent users, is far more so. As AI moves from research labs to production services — powering search engines, customer support, code generation, and autonomous systems — the inference workload is growing exponentially.

"AI is able to do productive work, and therefore the inflection point of inference has arrived," Huang said during his keynote. He cited a 1,000,000x increase in compute demand over just two years, driven primarily by inference rather than training.

This is the structural shift that explains the $1 trillion figure. Training is a capital expense. Inference is an operating expense — recurring, scaling with demand, and growing as AI becomes embedded in more services. The companies deploying AI at scale need chips that can run models cheaply and quickly, not just train them.

The Vera Rubin Platform: Seven Chips, One System

Nvidia's answer is the Vera Rubin platform, shipping in the second half of 2026. It is not a single chip but a full-stack system comprising seven processors designed to work together: the Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet Switch, and the newly integrated Groq 3 LPU.

The performance claims are aggressive. Nvidia says Vera Rubin delivers a 10x reduction in inference token cost versus the current Blackwell generation, with 10x higher inference throughput per watt. For training, the company claims 4x fewer GPUs are needed to train mixture-of-experts models. The full NVL72 configuration pairs 72 Rubin GPUs with 36 Vera CPUs, delivering 260 TB/s of bandwidth.

Every major cloud provider — AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave — along with system manufacturers Cisco, Dell, HPE, Lenovo, and Supermicro has committed to deploying Vera Rubin. AWS alone plans to deploy more than one million Nvidia GPUs including both Blackwell and Rubin generations.

The Groq Acquisition: Inference-Specific Silicon

The most architecturally interesting piece of Vera Rubin is its seventh chip — the Groq 3 LPU, a product of Nvidia's $20 billion acquisition of Groq in late 2025. The deal brought founder Jonathan Ross and over 200 engineers into Nvidia, making it the company's largest acquisition to date.

The Groq 3 LPU is fundamentally different from a GPU. It uses SRAM rather than HBM, with deterministic execution designed specifically for low-latency inference. A single LPX rack packs 256 LPU chips delivering 315 petaflops of FP8 AI inference compute with 128 GB of on-chip SRAM and 40 PB/s of bandwidth.

Nvidia's approach is heterogeneous: GPUs handle prefill and decode attention over the KV cache, while LPUs execute the latency-sensitive feed-forward and mixture-of-experts layers. The company claims this combined architecture delivers 35x higher inference throughput per megawatt compared to GPU-only configurations. The two systems are orchestrated by Nvidia's Dynamo software layer.

This is the "two-stage AI compute plan" that Huang outlined — prefill on Vera Rubin chips to transform input into tokens, decode on Groq LPUs to generate answers. It is a deliberate architectural split that treats inference not as a subset of training compute but as a distinct workload requiring purpose-built hardware.

The Feynman Roadmap: Looking Past 2027

Nvidia also previewed its post-Rubin architecture, codenamed Feynman, scheduled for 2028. It will use TSMC's 1.6nm A16 process node with 3D die-stacking, custom HBM beyond HBM4e, and silicon photonics — replacing electrical signals with optical ones for data transfer between chips.

Feynman will pair with the Rosa CPU, named for Nobel laureate Rosalyn Sussman Yalow, alongside next-generation networking (the Kyber system replacing Opera) that scales to NVL1152 configurations. The roadmap signals that Nvidia sees multi-generational investment in inference-optimized silicon, not a one-cycle bet.

What the AI Labs Are Saying

The endorsements from Nvidia's biggest customers underscore the inference thesis. OpenAI CEO Sam Altman said Vera Rubin would let the company "run more powerful models and agents at massive scale." Anthropic CEO Dario Amodei noted the platform "gives us the compute, networking and system design to keep delivering while advancing the safety and reliability our customers depend on."

Microsoft CEO Satya Nadella described plans to build "the most powerful AI superfactories" with maximum performance efficiency using Nvidia's latest hardware. These are not speculative commitments — they represent billions in planned infrastructure spending.

The Market Reaction: Enthusiasm, With Caveats

Nvidia shares climbed as high as $188.87 during the GTC keynote — a 4.8% intraday gain — but settled to close at roughly $183.22, up 1.7% from the pre-GTC close of $180.25. The pattern — initial excitement followed by a pullback to modest gains — reflects a market that has been here before with Nvidia's ambitious projections.

Wall Street's consensus remains bullish. Of 57 analysts covering Nvidia, 54 rate the stock a Buy or higher, with a mean price target around $267.54 — implying roughly 46% upside. Wedbush analysts wrote that the "AI Revolution is accelerating, not decelerating." But the stock's year-to-date decline of 1.8% suggests that some of the growth is already priced in, and investors want proof of sustained execution.

What This Means for the AI Stack

The $1 trillion projection is ultimately a statement about where value is accruing in the AI ecosystem. Training was the first phase — expensive, concentrated among a handful of frontier labs, and hardware-constrained. Inference is the second phase — distributed, recurring, and scaling with every new AI-powered product that reaches users.

For companies building on AI, the Vera Rubin platform and its heterogeneous GPU-LPU architecture represent a step-change in the economics of deployment. A 10x reduction in cost per token means AI features that were margin-negative at Blackwell economics could become viable businesses at Rubin economics. That is the real implication of the $1 trillion number — not just what Nvidia will earn, but what it enables everyone else to build.

The inference era is not coming. According to Nvidia's biggest bet yet, it is already here.

James Calder is the editor of The Search Signal, covering AI-powered search, generative engine optimization, and the future of brand discovery.