RamenPanda|Nov 25, 2025 19:08
Explanations of Google TPU vs GPU in plain language
1. The real reason for the birth of TPU (2013-2015)
-Google found that if each Android user searches with voice for 3 minutes every day, the existing CPU/GPU computing power would double that of global data centers.
-Conclusion: It is necessary to independently develop ASICs specifically for matrix operations (neural network inference), otherwise AI success will "prop itself up".
-In just 15 months from project initiation to mass production, it was quietly running behind Google search, maps, photos, and translation in 2015.
2. The essential difference between TPU and GPU
-GPU: A general-purpose parallel processor that carries the historical burden of "graphics rendering" (cache, branch prediction, complex scheduling, etc.).
-TPU: Minimalist Domain Specific Architecture
-The core is * * Systolic Array * *: weights are loaded only once, data flows unidirectionally like blood, and there is almost no write back to memory, completely solving the von Neumann bottleneck.
-Latest generation TPU v7 (Ironwood, released in April 2025) single-chip specifications:
- 4,614 TFLOPS(BF16)
-192GB HBM (same as Blackwell B200)
-Memory bandwidth 7370 GB/s
-Performance/power consumption ratio improved by 100% compared to the previous generation V6E (Trillium)
3. Real performance comparison
-In inference scenarios, TPU's cost-effectiveness is generally 30% -100% higher than Nvidia GPU (depending on the specific workload)
-Typical statement:
-For the same amount of money, 8 H100 jobs can be done with 1 TPU v5e Pod
-TPU v6 saves 60-65% more power than Hopper GPU
-The price reduction of the old generation TPU is extremely fierce, and after the new generation comes out, the previous generation is almost free
-Even Jensen Huang admits that only Google TPU is a "special case" in ASIC.
The biggest obstacle to the popularization of TPU
-Ecological Lockdown: Programmers studied CUDA+PyTorch in college, and used JAX/TensorFlow for TPU (although PyTorch is already supported, the library is still incomplete)
-Can only be used on Google Cloud (neither AWS nor Azure), the data migration cost (egress fee) is extremely high, and enterprises dare not fully integrate
-At present, the main advantage lies in reasoning, and although the training is strong, the CUDA ecosystem still has an advantage
5. Strategic significance for Google Cloud
-In the era of AI, the gross profit margin of cloud services has plummeted from 50-70% to 20-35%, as everyone is working for Nvidia (Nvidia's gross profit margin is 75%).
-Whoever can use self-developed ASICs to get rid of Nvidia will be able to return to 50%+gross profit.
-Progress of self-developed ASICs by three major cloud vendors: * * Google TPU>>AWS Trainium>Azure MAIA**
-Google has taken full control of the front-end (RTL) of TPU design, while Broadcom only focuses on back-end physical implementation, resulting in Broadcom's gross profit margin being reduced to around 50%.
-Internal: Google search, Gemini, Veo, etc. all use TPU inference; External GCP customers require Nvidia before giving Nvidia.
-TPU is the biggest ace for GCP to turn the tide and regain market share in the cloud industry in the AI era.
-Google is crazily expanding production of TPU v7 (Ironwood)
-External customers will only begin to receive large-scale orders by the end of 2025
-The industry expects an explosive growth in Google TPU shipments by 2026 (which is completely consistent with the "significant increase in Google TPU v7p" mentioned in your previous America report)
**Summary:**
Google TPU is currently the only self-developed AI chip that can truly compete with Nvidia, especially in the era of inference with overwhelming cost-effectiveness advantages. In the next 5-10 years, it will be the biggest moat for Google Cloud and one of the most important driving forces for TSMC's CoWoS packaging demand surge.
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink