Alphabet Unveils TPU 8i Inference Chip to Power AI Agent Era

Alphabet Inc.’s Google Cloud division officially introduced the TPU 8i, its first custom-designed silicon specifically optimized for artificial intelligence inference, during the Google Cloud Next 2026 conference in Las Vegas. The unveiling marks a strategic shift for the company, as it split its eighth-generation Tensor Processing Unit (TPU) family into two distinct architectures: the TPU 8i for inference and the TPU 8t for model training.

The TPU 8i is engineered to address the high-concurrency and low-latency requirements of the agentic era, where AI agents perform complex, multi-step reasoning tasks. According to technical specifications released by Google, the TPU 8i delivers 10.1 petaFLOPS of FP4 compute performance. To overcome the memory wall often encountered in large-scale model serving, the chip features 384 MB of on-chip SRAM—a threefold increase over the previous Ironwood generation—and 288 GB of high-bandwidth memory (HBM). This configuration allows the hardware to host massive key-value (KV) caches entirely on-silicon, significantly reducing the time cores spend waiting for data.

Google Cloud Chief Executive Officer Thomas Kurian stated during the keynote that the TPU 8i provides an 80% improvement in performance-per-dollar for large language model inference compared to its predecessor. Kurian emphasized that as enterprises move from experimental AI to production-level AI agents, the cost per transaction must decrease to enable scaling. The hardware also introduces a new Boardfly network topology, which connects 1,152 chips in a single pod to minimize communication steps and reduce network latency by 50%.

A key addition to the TPU 8i is the Collectives Acceleration Engine (CAE), a specialized component designed to offload resource-heavy coordination tasks. Mark Lohmeyer, Vice President of Compute and AI Infrastructure at Google Cloud, noted that the CAE reduces on-chip latency by up to five times during high-concurrency requests. Furthermore, the TPU 8i is integrated into servers that now feature double the physical CPU hosts, utilizing Alphabet’s custom Axion Arm-based CPUs to improve overall system efficiency.

On the sustainability front, Alphabet reported that the TPU 8i achieves a 117% increase in performance-per-watt over the prior generation. Amin Vahdat, Senior Vice President and Chief Technologist of AI and Infrastructure, described the eighth-generation family as a critical component of the Google Cloud AI Hypercomputer, an integrated stack of hardware, software, and networking.

The TPU 8i is scheduled for general availability on the Google Cloud platform later in 2026. Alphabet confirmed that the new hardware is already being utilized to power its own Gemini models and will be available to enterprise customers through Vertex AI. The company also highlighted that its infrastructure will support upcoming third-party deployments, including next-generation foundation models for partners such as Apple and Anthropic.

Related Articles