Google Launches TPUv8 Architecture with Specialized Inference and Memory Chips

Google announced its eighth-generation Tensor Processing Unit (TPU) architecture today, April 20, 2026, marking a significant expansion of its custom silicon strategy. The new TPUv8 lineup is bifurcated into two specialized chips: the TPUv8i, codenamed Zebrafish, designed specifically for AI inference, and the TPUv8t, codenamed Sunfish, optimized for high-performance model training. This development comes as the industry shifts its focus from training massive models to the age of inference, where the primary metrics for success are cost per token, latency, and energy efficiency.

According to technical specifications released ahead of the Google Cloud Next conference in Las Vegas, the TPUv8i is designed to deliver a substantial reduction in per-query costs compared to the outgoing TPUv7 Ironwood series. A critical component of the new architecture is the introduction of a dedicated memory processing unit (MPU). Reports indicate that Google is in advanced negotiations with Marvell Technology to develop these MPUs, which are designed to work alongside TPUs to manage memory-intensive tasks separately from pure computation. Google reportedly plans to produce nearly two million of these units to support its global data center footprint.

This move diversifies Google's supply chain, which has historically relied heavily on Broadcom for high-performance TPU designs. While Broadcom recently signed a new contract with Google through 2031 to design training-focused components, the inclusion of MediaTek for the TPUv8i design and the potential partnership with Marvell for MPUs signals a broader multi-vendor strategy. The TPUv8 series is tightly integrated with Google’s Axion Arm-based CPUs, which utilize the Neoverse N3 architecture. This vertical integration allows for improved data movement between the CPU and the AI accelerator, addressing bottlenecks that often occur in large-scale inference workloads.

Amin Vahdat, Google’s head of AI infrastructure, noted that the specialization of chips for training versus inference is a response to the surging demand for real-time AI agents and multi-step reasoning tasks. The announcement places Google in direct competition with Nvidia, which currently maintains a dominant share of the AI semiconductor market. While Nvidia recently integrated licensed Language Processing Unit technology into its own Blackwell Ultra and Rubin platforms to improve inference speeds, Google’s TPUv8i aims to offer a more cost-effective alternative for hyperscale deployments.

Google confirmed that the TPUv8 architecture will be available through Google Cloud Platform starting in the second half of 2026. The company also highlighted that the new chips feature third-generation SparseCore technology, which accelerates the embedding-heavy workloads common in recommendation engines and large language models. By separating the hardware requirements for training and serving, Google intends to optimize its infrastructure for the increasing volume of AI-driven consumer and enterprise queries.

Related Articles