Google Expands AI Silicon Portfolio with New Inference-Optimized Chips to Rival Nvidia

On April 20, 2026, reports emerged that Alphabet Inc. subsidiary Google is in advanced negotiations with Marvell Technology to co-develop two new semiconductors designed to optimize artificial intelligence inference. This strategic expansion of Google’s custom silicon roadmap is intended to bolster the company’s competitive position against Nvidia, which currently controls an estimated 80 percent to 90 percent of the AI accelerator market. The new hardware includes a dedicated memory processing unit and a next-generation Tensor Processing Unit specifically tailored for the inference phase, which is the process of running trained AI models to serve user queries.

The proposed memory processing unit is designed to operate in tandem with Google’s existing Tensor Processing Unit architecture to alleviate data bottlenecks and improve memory bandwidth, a critical factor in the performance of large language models. Technical specifications for the inference-optimized Tensor Processing Unit indicate a focus on cost per token and energy efficiency rather than raw training power. This development follows the deployment of Google’s seventh-generation Tensor Processing Unit, codenamed Ironwood, which became available in late 2025. Ironwood succeeded the sixth-generation Trillium, or TPU v6, which delivered a 4.7-fold increase in peak compute performance per chip compared to the TPU v5e. Trillium featured 32 gigabytes of High Bandwidth Memory and achieved 925.9 teraflops of BF16 peak performance.

By engaging Marvell as a third design partner, Google is diversifying a supply chain that has historically relied on Broadcom and MediaTek. Broadcom recently secured a long-term agreement to supply Tensor Processing Units through 2031, while MediaTek has assisted with cost-optimized variants. The addition of Marvell-designed chips is expected to support the massive scaling requirements of Google’s Gemini models, the newly launched Gemma 4 open-weights series, and external partners like Anthropic. Anthropic is currently scheduled to access approximately 3.5 gigawatts of next-generation Tensor Processing Unit-based compute capacity starting in 2027 to power its autonomous agentic systems.

The shift toward specialized inference hardware comes as the custom application-specific integrated circuit market is projected to grow by 45 percent in 2026, reaching an estimated 118 billion dollars by 2033. Google’s infrastructure currently supports billions of daily AI-augmented search queries and Gemini conversations. Internal data suggests that specialized inference silicon can reduce per-query costs by 20 percent to 30 percent compared to general-purpose graphics processing units. This efficiency is further supported by the deployment of Google’s Axion processors, Arm-based CPUs that offer up to 60 percent better energy efficiency than comparable x86-based instances for general-purpose cloud workloads.

While Nvidia remains the industry standard for model training, the industry is transitioning toward an inference era where the volume of compute is driven by day-to-day application use. Google’s latest hardware announcements at the Google Cloud Next conference in Las Vegas emphasize this transition. Amin Vahdat, Google’s vice president of machine learning infrastructure, stated that the company is focusing on balanced systems that integrate CPUs, Tensor Processing Units, and custom interconnects to meet the surging demand for AI agents and real-time processing.