China's AI Computing Power Counterattack Battle

Text | Sleepy.txt

Eight years ago, ZTE suffered a cardiac arrest.

On April 16, 2018, a ban from the U.S. Department of Commerce's Bureau of Industry and Security caused ZTE Corporation, the world's fourth-largest telecom equipment manufacturer with 80,000 employees and annual revenue exceeding 100 billion, to come to a halt overnight. The ban was simple: for the next seven years, no American company was allowed to sell parts, goods, software, or technology to ZTE.

Without Qualcomm chips, base stations ceased production. Without Google's Android license, phones had no usable operating system. Twenty-three days later, ZTE announced that it could not carry out its main business activities.

However, ZTE ultimately survived, but at a cost of 1.4 billion dollars.

One billion dollars in fines, paid in one lump sum; 400 million dollars in a deposit placed in a U.S. bank's escrow account. Additionally, all executives were replaced, and ZTE accepted a compliance oversight team from the U.S. In 2018, ZTE reported a net loss of 7 billion yuan, with revenue plunging by 21.4% year-on-year.

Then-chairman Yin Yimin wrote in an internal letter, "We are in a complex industry that is highly dependent on the global supply chain." At the time, this sentence felt like reflection and helplessness.

Eight years later, on February 26, 2026, Chinese AI unicorn DeepSeek announced that its upcoming V4 multimodal large model would prioritize deep collaboration with domestic chip manufacturers, achieving a full workflow from pre-training to fine-tuning without Nvidia for the first time.

Translated, this means: we will no longer use Nvidia.

Upon the announcement, the market's first reaction was skepticism. Nvidia holds more than 90% of the global AI training chip market share; is abandoning it commercially viable?

Yet behind DeepSeek's choice lies a question larger than business logic: what kind of computing independence does Chinese AI truly need?

What is really being choked?

Many believe that the chip ban targets hardware. But what truly suffocates Chinese AI companies is something called CUDA.

CUDA, short for Compute Unified Device Architecture, is a parallel computing platform and programming model launched by Nvidia in 2006. It allows developers to directly leverage the computing power of Nvidia GPUs to accelerate various complex computing tasks.

Before the AI era arrived, this was merely a tool for a few geeks. But when the wave of deep learning hit, CUDA became the cornerstone of the entire AI industry.

The training of AI large models essentially involves massive matrix computations. And that is precisely what GPUs excel at.

Nvidia, having laid the groundwork years in advance, built a complete toolchain from the bottom hardware to upper applications for global AI developers with CUDA. Today, all mainstream AI frameworks worldwide, from Google's TensorFlow to Meta's PyTorch, are deeply bound to CUDA at their core.

An AI PhD student learns to program and do experiments in a CUDA environment from day one of enrollment. Every line of code they write reinforces Nvidia's moat.

By the end of 2025, the CUDA ecosystem had over 4.5 million developers and covered more than 3,000 GPU-accelerated applications, with over 40,000 companies worldwide using CUDA. This figure means that more than 90% of AI developers globally are bound to Nvidia's ecosystem.

The terrifying aspect of CUDA is that it acts like a flywheel. The more developers use it, the more tools, libraries, and code it generates, making the ecosystem more prosperous; the more prosperous the ecosystem, the more developers it attracts. Once this flywheel starts turning, it is nearly impossible to shake.

The result is that Nvidia sells you the most expensive shovel while defining the only way to mine. Want to switch shovels? You can, but you first have to rewrite all the experiences, tools, and code accumulated by the world's smartest minds over the past decade, all based on that one method.

Who will bear this cost?

Thus, when on October 7, 2022, the first round of regulations took effect, restricting the export of Nvidia A100 and H100 chips to China, Chinese AI companies collectively experienced the suffocating sensation of ZTE for the first time. Nvidia subsequently launched the "China special version" A800 and H800, reducing inter-chip interconnect bandwidth to barely maintain supply.

But just a year later, on October 17, 2023, the second round of regulations tightened again; A800 and H800 were also banned, and 13 Chinese companies were added to the entity list. Nvidia had to launch an even more stripped-down H20. By December 2024, under the Biden administration's final round of regulations, even H20's exports were strictly limited.

Three rounds of regulations, layer upon layer of tightening.

However, this time, the direction of the story is completely different from that of ZTE.

An asymmetric breakthrough

Under the ban, everyone thought that the dream of large models in Chinese AI would come to an end.

They were wrong. Faced with the blockade, Chinese companies did not choose to confront head-on; instead, they began a breakout strategy. The first battlefield of this breakout was not chips, but algorithms.

From the end of 2024 to 2025, Chinese AI companies collectively turned to a technical direction: mixed expert models.

In simple terms, this involves breaking a massive model into many smaller experts, activating only the most relevant ones during task processing, rather than mobilizing the entire model.

DeepSeek's V3 is a typical representative of this idea. It has 671 billion parameters, but only activates 37 billion during each inference, accounting for just 5.5% of the total. In terms of training costs, it utilized 2048 Nvidia H800 GPUs, training for 58 days at a total cost of 5.576 million dollars. In comparison, outside estimates for the training cost of GPT-4 are about 78 million dollars. A significant difference in scale.

The extreme optimization on the algorithm front directly reflects in price. DeepSeek's API pricing is 0.028 to 0.28 dollars per million tokens for input and 0.42 dollars for output. In contrast, GPT-4's input price is 5 dollars, with an output cost of 15 dollars. Claude Opus is even more expensive, at 15 dollars for input and 75 dollars for output. By comparison, DeepSeek is 25 to 75 times cheaper than Claude.

This price gap has resonated greatly in the global developer market. In February 2026, on OpenRouter, the world's largest AI model API aggregation platform, the weekly call volume for Chinese AI models skyrocketed by 127% within three weeks, surpassing the U.S. for the first time. A year prior, Chinese models held less than 2% market share on OpenRouter. A year later, they grew by 421%, approaching 60%.

Behind this set of data is a structural change that is easy to overlook. Starting in the second half of 2025, the mainstream application scenarios for AI shifted from chatting to agents. In agent scenarios, the token consumption for each task is 10 to 100 times that of simple chatting. When token consumption exponentially increases, price becomes a decisive factor. The extreme cost-performance ratio of Chinese models coincided perfectly with this window.

However, the problem is that the reduction in inference costs does not solve the fundamental problem of training. A large model will rapidly degrade if it cannot continuously train and iterate on the latest data. Training remains that unavoidable computing power black hole.

So, where does the "shovel" for training come from?

The formalization of the backup

In Xinhua, Jiangsu, a small city in central Jiangsu known for stainless steel and health food, previously had no connection to AI. But in 2025, a 148-meter-long domestic computing power server production line was established here, taking only 180 days from signing to production.

The core of this production line consists of two completely domestically produced chips: the Loongson 3C6000 processor and the Taichu Yuanqi T100 AI accelerator card. The Loongson 3C6000 was developed entirely in-house from the instruction set to the microarchitecture. The Taichu Yuanqi was born from the National Supercomputing Center in Wuxi and a Tsinghua University team, utilizing a heterogeneous many-core architecture.

This production line can produce one server every five minutes at full capacity, with a total investment of 1.1 billion yuan and an expected annual output of 100,000 units.

More importantly, clusters based on these domestic chips have already begun undertaking real large model training tasks.

In January 2026, Zhizhu AI, in collaboration with Huawei, launched GLM-Image, the first SOTA image generation model to achieve complete training relying solely on domestic chips. In February, China Telecom's 100 billion-level "Star" large model completed full process training on the domestic computing power pool in Lin-gang, Shanghai.

The significance of these cases is that they prove one thing: domestic chips have crossed the threshold from "usable for inference" to "usable for training." This is a qualitative change. Inference only requires running already trained models, which demands relatively low requirements from chips; while training needs to process massive amounts of data and perform complex gradient calculations and parameter updates, with requirements on chips' computational power, interconnect bandwidth, and software ecosystems increasing by an order of magnitude.

The core force undertaking these tasks is Huawei's Ascend series chips. By the end of 2025, the number of developers in the Ascend ecosystem had surpassed 4 million, with over 3,000 partners, 43 mainstream large models in the industry having completed pre-training based on Ascend, and more than 200 open-source models successfully adapted. At the MWC event on March 2, 2026, Huawei also launched the new generation of computing power foundation, SuperPoD, for international markets.

The FP16 computing power of the Ascend 910B has already matched that of the Nvidia A100. While gaps still exist, it has transitioned from unusable to usable, and is improving towards being handy. Building an ecosystem cannot wait for the chip to be perfect; it must be deployed on a large scale during the usable stage, compelling chips and software to iterate based on real business needs. ByteDance, Tencent, and Baidu's import targets for domestic computing power servers in 2026 saw a generally doubled growth compared to the previous year. According to the Ministry of Industry and Information Technology, China's total intelligent computing has reached 1590 EFLOPS. The year 2026 is becoming the inaugural year for deploying domestic computing power at scale.

U.S. Power Crisis and China's Outbound Expansion

In early 2026, Virginia, which carries a large amount of data center traffic globally, suspended the approval of new data center construction projects. Georgia followed suit, with the suspension extending to 2027. Illinois and Michigan also implemented restrictions.

According to the International Energy Agency, the electricity consumption of U.S. data centers reached 183 terawatt-hours in 2024, accounting for about 4% of the national total electricity consumption. By 2030, this figure is expected to double to 426TWh, with the proportion likely exceeding 12%. The CEO of Arm even forecasted that by 2030, AI data centers will consume 20% to 25% of U.S. electricity.

The U.S. power grid is already overloaded. The PJM grid covering 13 states in the eastern U.S. is facing a 6GW capacity shortfall. By 2033, the U.S. overall faces a 175GW power capacity gap, equivalent to the electricity consumption of 130 million households. The wholesale electricity costs in concentrated data center regions are up by 267% compared to five years ago.

The end of computing power is energy. And in terms of energy, the gap between China and the U.S. is even larger than that of chips, just reversed.

China's annual electricity generation is 10.4 trillion kWh, while the U.S. generates 4.2 trillion kWh, making China 2.5 times that of the U.S. More critically, residential electricity consumption in China accounts for only 15% of total electricity use, while in the U.S., this proportion is 36%. This implies that China has a much larger surplus of industrial electricity available for computing power development than the U.S.

Regarding electricity prices, the electricity cost in areas where U.S. AI companies are clustered ranges from 0.12 to 0.15 dollars per kilowatt-hour, while industrial electricity prices in western China are about 0.03 dollars, only one-fourth to one-fifth of the U.S. rate.

China's increase in power generation has already reached seven times that of the U.S.

While the U.S. worries about power, China's AI is quietly expanding overseas. But this time, what is going overseas is not products or factories, but tokens.

Tokens, the smallest units of information processed by AI models, are becoming a new digital commodity. They are produced in China's computing power factories and transmitted globally via undersea cables.

DeepSeek’s user distribution data illustrates the situation: 30.7% locally, 13.6% in India, 6.9% in Indonesia, 4.3% in the U.S., and 3.2% in France. It supports 37 languages and is popular in emerging markets like Brazil. Globally, 26,000 companies have opened accounts, and 3,200 institutions have deployed the enterprise version.

In 2025, 58% of new AI startups incorporated DeepSeek into their technology stacks. In China, DeepSeek captured 89% market share. In other sanctioned countries, market share ranges from 40% to 60%.

This scene is reminiscent of another war about industrial autonomy from forty years ago.

In 1986 Tokyo, under the immense pressure from the U.S., the Japanese government signed the U.S.-Japan Semiconductor Agreement. The core terms of the agreement included three points: demanding Japan open its semiconductor market with U.S. chips having to reach over 20% market share in Japan; prohibiting Japan from exporting semiconductors below cost; and imposing a 100% punitive tariff on 300 million dollars worth of chips exported from Japan. At the same time, the U.S. vetoed Fujitsu's acquisition of Fairchild Semiconductor.

That year marked the peak of Japan's semiconductor industry. By 1988, Japan controlled 51% of the global semiconductor market, while the U.S. held only 36.8%. Among the top ten semiconductor companies globally, Japan held six seats: NEC ranked second, Toshiba third, Hitachi fifth, Fujitsu seventh, Mitsubishi eighth, and Panasonic ninth. In 1985, Intel incurred a loss of 173 million dollars in the U.S.-Japan semiconductor competition, nearing bankruptcy.

However, everything changed after the agreement was signed.

The U.S. launched comprehensive suppression against Japanese semiconductor companies through methods like the 301 investigation, while supporting South Korea's Samsung and Hynix to penetrate the Japanese market at lower prices. Japan's DRAM share plummeted from 80% to 10%. By 2017, Japan's IC market share dwindled to just 7%. Once-unstoppable giants were either split, acquired, or faded from the scene amid relentless losses.

The tragedy of Japanese semiconductors lies in its contentment with being the best producer within a globally divided labor system dominated by a singular external force, without ever attempting to build a self-owned, independent ecosystem. When the tide recedes, it discovered it had nothing but production.

Today's Chinese AI industry stands at a similar yet entirely different crossroads.

What is similar is that we too face immense external pressure. Three rounds of chip regulations, layered and intensifying, and the barriers of the CUDA ecosystem remain towering.

What is different is that this time, we have chosen a harder path. From extreme optimization at the algorithm level, to the transition of domestic chips from inference to training, to the accumulation of 4 million developers in the Ascend ecosystem, and finally to the overseas penetration of tokens into the global market. Every step on this path constructs an independent industry ecosystem that Japan never possessed back then.

Conclusion

On February 27, 2026, three performance reports from local AI chip companies were released on the same day.

Cambrian's revenue surged by 453%, achieving annual profitability for the first time. Moore Threads saw revenue growth of 243% but reported a net loss of 1 billion. Muxi had a revenue increase of 121%, with a net loss of nearly 800 million.

Half is fire, half is seawater.

The fire represents the market's extreme thirst. The 95% void Huang Renxun left is being filled inch by inch by the revenue figures of these local companies. Regardless of performance or ecosystem, the market needs a second choice outside of Nvidia. This is a once-in-a-lifetime structural opportunity opened by geopolitical tensions.

The seawater represents the enormous cost of building an ecosystem. Every cent of loss is the real money paid to catch up with the CUDA ecosystem. It accounts for R&D investment, software subsidies, and the human costs of engineers dispatched to customer sites, solving compilation issues one-by-one. These losses are not due to mismanagement but are the war tax necessary to build an independent ecosystem.

These three financial reports honestly record the true nature of this computing power war more than any industry report. It is not a victory marching forward with great strides but a brutal, blood-letting trench warfare.

However, the form of the war has indeed changed. Eight years ago, we were discussing the question of "can we survive." Today, we are discussing "what is the cost to survive."

The cost itself is progress.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

China's AI Computing Power Counterattack Battle

What is really being choked?

An asymmetric breakthrough

The formalization of the backup

U.S. Power Crisis and China's Outbound Expansion

Conclusion

Selected Articles by 律动BlockBeats

Table of Contents

Related Articles