GPU vs TPU
Insights on Google TPUs from a SemiAnalysis Podcast (9 Months Ago)
- 01 Scale & Architecture
- 02 Internal Mission
- 03 Co-design Issues
- 04 Software Ecosystem
- 05 Organization
- → Summary
This video primarily discusses Google’s TPU cluster architecture, its divergence from Nvidia GPUs, and why Google—despite possessing potent hardware—has not dominated the external chip sales (merchant silicon) market like Nvidia. Below are the detailed points expressed by Dylan Patel.
Google Has the Largest Scale, but a Unique Architecture
Scale and Layout: Google actually possesses the world's largest computing clusters, but their deployment strategy differs fundamentally from the "single, monolithic giant cluster" approach pursued by Nvidia or Elon Musk (e.g., xAI).
Multi-Data Center Interconnection: Instead of concentrating all chips into one massive physical site, Google’s supercomputers are distributed across multiple data centers located in close proximity (for example, four data centers in Iowa and Nebraska, and similar facilities in Ohio).
High-Speed Connectivity: These data centers are tightly coupled via extremely high-bandwidth fiber optic networks, forming what are known as "Super Regions." Although physically dispersed (roughly 30 miles apart), they operate logically as a single cohesive unit.
Infrastructure Design: Google’s data center design is highly unique and advanced, equipped with specialized water-cooling towers and backup diesel generators. The infrastructure is custom-built specifically to support the efficient operation of TPUs.
The Core Mission of TPU: Serving Internal Business
Lowering Search Costs: The original genesis of the TPU was to drive the operating costs of Google Search down to rock bottom and to build models specifically for that purpose.
Priority on Internal Workloads: The vast majority of TPU capacity is dedicated to Google’s own core businesses, including Search, YouTube, Ads, and now the Gemini models. These businesses generate hundreds of billions in revenue annually—they are Google’s "Golden Goose."
Lack of Sales Incentive: Compared to the model of earning tens of billions by selling chips (the Nvidia model), Google is far more focused on protecting and optimizing its core, high-margin service businesses. Consequently, it is extremely difficult to convince senior leadership to alter strategy just to sell chips.
The Double-Edged Sword of Hardware-Software Co-design
Targeted Optimization: Google’s architecture is optimized to the extreme for its own requirements. For instance, Google’s open-source Gemma 7B model actually has 8 billion parameters. This is because its Vocabulary size is exceptionally large.
Architectural Divergence: This design choice stems from the fact that the TPU’s Matrix Multiply Unit (MMU) is massive; Google was forced to increase the vocabulary size just to saturate the hardware’s compute capacity.
Compatibility Issues: This specific optimization makes Gemma run with incredible efficiency on TPUs, but it performs worse than LLaMA on Nvidia GPUs. Conversely, LLaMA does not run as efficiently on TPUs as Gemma does.
The Closed Nature of the Software Ecosystem
Superb Internal Experience: Inside Google, the software stack used by researchers (such as JAX and XLA) is highly optimized. Researchers can train models efficiently through abstract layers without needing to understand the underlying hardware details—the experience is described as "magical."
External Adaptation Difficulties: When these researchers leave Google to found new startups, they often face a massive shock. Because this highly integrated internal infrastructure does not exist outside, they find that whether they use GPUs or try to use TPUs, they are forced to grapple with complex infrastructure and software challenges.
Software Not Productized: Google’s internal software stack was built primarily to serve DeepMind and the Search teams, unlike Nvidia’s CUDA and NCCL, which were polished into commercial products specifically to serve external customers.
Organizational Fragmentation
Departmental Silos: There is significant bureaucracy and fragmentation within Google. Google Cloud, the TPU hardware team, DeepMind, and the Search team are independent organizations.
Misaligned Goals: The TPU team sits within the infrastructure division; its primary customers are DeepMind and the Search team, not Google Cloud’s external clients. This results in a TPU service experience on Google Cloud that is less customer-centric than Nvidia’s offering.
Summary
"Google possesses incredibly powerful hardware capabilities and a cluster scale that arguably surpasses Nvidia’s. However, its technology stack is highly 'inward-facing'—everything is designed to optimize its own Search and Ad businesses. This deep customization makes it difficult for its hardware and software to be adopted as general-purpose products like Nvidia’s."