链研社|AI First🔶💧
链研社|AI First🔶💧|5月 27, 2026 03:19
The 'efficiency revolution' of Chinese computing power is more effective than expanding storage production lines A counterintuitive fact: Chinese AI companies are using less memory to achieve similar results. The paper is open source, which may reduce the inference cost of OpenAI, Anthropic, and Gemini by an order of magnitude, increase gross profit, and also reduce the need for memory by an order of magnitude. Taking DeepSeek's MLA architecture, KV cache optimization, and various model quantization techniques as examples, these actions directly and significantly reduce the memory usage and bandwidth requirements in the inference stage, resulting in a cliff like drop in the cost of generating unit tokens. The ultra fast inference of Zhipu and the cache billing of Alibaba and Xiaomi Qianwen are directly reduced to one tenth. What is the essence of these actions? It is all about algorithm compression efficiency and maximizing the utilization of computing power. But the market is using old maps to find new paths The US stock AI is still constantly piling up capital expenditures, locking in large production capacity and computing power in advance. The 700 billion capital expenditure is enough to make the entire AI upstream and downstream industry chain carnival. This logic is correct. The demand for computing power and memory is indeed very high, and the growth is also very fast. But the problem is that it ignores another curve, the efficiency improvement space for computing power optimization in China, which is also astonishingly large. Everyone is betting that the 'water seller' can continue to make money, but no one has noticed that the gold miners have suddenly learned to recycle water. If Chinese AI companies compress memory usage efficiency by another 50%, will the narrative of storage stocks that rely on capital to support valuation still hold true? At present, the huge profits of the AI hardware industry chain are largely based on the absolute dependence on the highest end HBM high bandwidth memory. If the demand for memory in the model decreases, it may directly break the monopoly premium of the original top manufacturers, and the underlying logic of storage and computing stocks that rely on capital expenditures to support valuation will loosen. It seems that no one in the entire market is seriously calculating how much memory can be saved by China's efficiency revolution at the algorithm level However, objectively speaking, if the inference cost and memory usage are reduced by 50%, it may lead to a major outbreak of high-frequency API calls for AI agents around the clock and AI landing applications. The single usage is reduced, but if the total call frequency increases tenfold, the absolute demand for overall memory and computing power will still skyrocket. China's computing power is more effective than expanding storage production lines, and may break the monopoly premium of existing top manufacturers. This is a risk that needs to be noted at present, as well as how far the efficiency of computing power can go and whether it can continue to improve and optimize. What is uncertain is how long this' unpriced 'window period will be. Maybe three months, maybe a year.
Share To

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads