RamenPanda
RamenPanda|Nov 25, 2025 19:08
Explanations of Google TPU vs GPU in plain language 1. The real reason for the birth of TPU (2013-2015) -Google found that if each Android user searches with voice for 3 minutes every day, the existing CPU/GPU computing power would double that of global data centers. -Conclusion: It is necessary to independently develop ASICs specifically for matrix operations (neural network inference), otherwise AI success will "prop itself up". -In just 15 months from project initiation to mass production, it was quietly running behind Google search, maps, photos, and translation in 2015. 2. The essential difference between TPU and GPU -GPU: A general-purpose parallel processor that carries the historical burden of "graphics rendering" (cache, branch prediction, complex scheduling, etc.). -TPU: Minimalist Domain Specific Architecture -The core is * * Systolic Array * *: weights are loaded only once, data flows unidirectionally like blood, and there is almost no write back to memory, completely solving the von Neumann bottleneck. -Latest generation TPU v7 (Ironwood, released in April 2025) single-chip specifications: - 4,614 TFLOPS(BF16) -192GB HBM (same as Blackwell B200) -Memory bandwidth 7370 GB/s -Performance/power consumption ratio improved by 100% compared to the previous generation V6E (Trillium) 3. Real performance comparison -In inference scenarios, TPU's cost-effectiveness is generally 30% -100% higher than Nvidia GPU (depending on the specific workload) -Typical statement: -For the same amount of money, 8 H100 jobs can be done with 1 TPU v5e Pod -TPU v6 saves 60-65% more power than Hopper GPU -The price reduction of the old generation TPU is extremely fierce, and after the new generation comes out, the previous generation is almost free -Even Jensen Huang admits that only Google TPU is a "special case" in ASIC. The biggest obstacle to the popularization of TPU -Ecological Lockdown: Programmers studied CUDA+PyTorch in college, and used JAX/TensorFlow for TPU (although PyTorch is already supported, the library is still incomplete) -Can only be used on Google Cloud (neither AWS nor Azure), the data migration cost (egress fee) is extremely high, and enterprises dare not fully integrate -At present, the main advantage lies in reasoning, and although the training is strong, the CUDA ecosystem still has an advantage 5. Strategic significance for Google Cloud -In the era of AI, the gross profit margin of cloud services has plummeted from 50-70% to 20-35%, as everyone is working for Nvidia (Nvidia's gross profit margin is 75%). -Whoever can use self-developed ASICs to get rid of Nvidia will be able to return to 50%+gross profit. -Progress of self-developed ASICs by three major cloud vendors: * * Google TPU>>AWS Trainium>Azure MAIA** -Google has taken full control of the front-end (RTL) of TPU design, while Broadcom only focuses on back-end physical implementation, resulting in Broadcom's gross profit margin being reduced to around 50%. -Internal: Google search, Gemini, Veo, etc. all use TPU inference; External GCP customers require Nvidia before giving Nvidia. -TPU is the biggest ace for GCP to turn the tide and regain market share in the cloud industry in the AI era. -Google is crazily expanding production of TPU v7 (Ironwood) -External customers will only begin to receive large-scale orders by the end of 2025 -The industry expects an explosive growth in Google TPU shipments by 2026 (which is completely consistent with the "significant increase in Google TPU v7p" mentioned in your previous America report) **Summary:** Google TPU is currently the only self-developed AI chip that can truly compete with Nvidia, especially in the era of inference with overwhelming cost-effectiveness advantages. In the next 5-10 years, it will be the biggest moat for Google Cloud and one of the most important driving forces for TSMC's CoWoS packaging demand surge.
+6
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads