Google Launches TPU v7 Custom AI Accelerators at Cloud Next 2026

Google Cloud officially introduced its seventh-generation custom AI accelerator, the Tensor Processing Unit (TPU) v7, during the opening keynote of the Google Cloud Next 2026 conference on April 20. The announcement represents a significant expansion of Google’s internal silicon capabilities as the company seeks to provide high-performance alternatives to merchant silicon for artificial intelligence workloads. The TPU v7 is designed to handle both the training of massive large language models and the increasingly high-demand inference phase, where trained models generate responses for end-users.

Technical specifications released by Google Cloud engineering teams indicate that the TPU v7 provides a 3.8-fold increase in peak floating-point operations per second (FLOPS) compared to the TPU v6. The new chips are deployed in clusters consisting of 8,192 chips interconnected via Google’s proprietary optical circuit switching (OCS) technology. This networking fabric allows for a total aggregate bandwidth of 12.8 petabits per second per pod, which Google states is critical for scaling the next generation of multimodal AI models that exceed 10 trillion parameters.

A primary focus of the TPU v7 architecture is energy efficiency and inference throughput. According to Thomas Kurian, CEO of Google Cloud, the v7 chips deliver a 50% improvement in performance-per-watt over the previous generation. This efficiency is achieved through a 3-nanometer manufacturing process and an enhanced sparse core architecture that accelerates the processing of sparse data sets common in recommendation systems and generative AI. Google reported that in internal testing, the TPU v7 executed inference tasks for the Gemini 2.0 Ultra model at twice the speed of the TPU v6 while consuming 30% less power.

In addition to the hardware, Google introduced the AI Hypercomputer architecture, an integrated system that combines TPU v7 accelerators with the company’s new Axion ARM-based CPUs and high-speed Jupiter networking. This system is designed to optimize the entire hardware-software stack, from the physical silicon up to the Vertex AI platform. Google confirmed that several early-access partners, including Anthropic and Midjourney, have already begun migrating portions of their production workloads to TPU v7 clusters to take advantage of the increased memory bandwidth.

The rollout of TPU v7 is scheduled to begin in select North American data centers in June 2026, with global availability across Google Cloud’s 40 regions expected by the end of the year. Google also announced a new consumption model called Inference-on-Demand, which allows customers to rent TPU v7 capacity in smaller increments, specifically targeting startups and enterprise developers who require high-performance silicon for real-time AI applications without the overhead of full-cluster reservations.

Related Articles