Google Expands Custom Silicon Strategy with New AI Inference Chips and Marvell Partnership

Google is reportedly in advanced negotiations with Marvell Technology to co-develop two new processors specifically optimized for AI inference tasks. This development, reported on April 20, 2026, marks a significant expansion of Google’s internal hardware capabilities as it seeks to challenge Nvidia’s dominance in the rapidly growing inference market. The move highlights a broader industry trend of hyperscalers developing bespoke silicon to manage the escalating costs of running large-scale generative AI models.

The collaboration involves two distinct pieces of silicon: a next-generation Tensor Processing Unit (TPU) built specifically for inference and a first-of-its-kind Memory Processing Unit (MPU). The MPU is designed to work alongside existing TPUs to offload memory-intensive tasks, addressing the data-movement bottlenecks that currently limit the efficiency of large language models. Google plans to produce nearly two million of these units, with designs expected to be finalized for production as early as 2027. Technical specifications suggest the MPU will focus on high-bandwidth data handling to complement the raw compute power of the TPU architecture.

This partnership represents a strategic shift away from Google’s long-standing reliance on Broadcom for TPU design. While Google recently extended its broader infrastructure contract with Broadcom through 2031, the Marvell deal allows Google to diversify its supply chain and potentially reduce the high per-unit fees associated with its current hardware. Marvell brings specialized experience to the project, having previously designed inference-focused silicon for other industry players. This diversification is seen as a critical step in Google’s effort to control its hardware stack from the cooling systems up to the software compilers.

The announcement comes just two days before the start of the Google Cloud Next 2026 conference in Las Vegas, where the "agentic cloud" is expected to be a central theme. Google is already seeing massive demand for its existing sixth-generation TPU, Trillium, which delivers a 4.7x peak performance increase over previous generations. Major enterprise customers, including Anthropic and Meta Platforms, have already secured multibillion-dollar agreements for TPU capacity to power their respective AI agents and models, signaling a shift in the market toward non-Nvidia hardware for production-grade inference.

Beyond the new inference chips, Google’s infrastructure ecosystem now includes the Axion Gen 2 processor, which entered general availability earlier this year. Built on the Arm Neoverse N3 architecture, Axion Gen 2 provides a 50% performance boost and 60% better energy efficiency than comparable x86 instances. By combining custom CPUs like Axion with specialized inference TPUs and MPUs, Google aims to provide a full-stack alternative to Nvidia’s Blackwell and Rubin platforms for the next generation of AI workloads.

Related Articles