τ Scaling: Huawei's new growth engine designed for the post-Moore era

From "shrinking size" to "compressing time".

For the past 60 years, the semiconductor industry has driven progress by reducing transistor size (Moore's Law), making them smaller, denser, and cheaper.

But now this path is no longer viable:

Yield for processes below 7nm has plummeted
The cost of photolithography equipment is sky-high
The design cost of advanced process single chips exceeds 1 billion US dollars
The cost of a single transistor is rising rather than falling

The Huawei semiconductor team has validated a new direction in 6 years with 381 mass-produced chips:

Not competing on size, but competing on time.

Proposed τ scaling theory (τ Scaling):

Treat "time" as the core optimization metric, compress the characteristic time τ across the entire link, from the transistor switch (picoseconds) to data center tasks (seconds), covering 12 orders of magnitude.

In simple terms:

Previously competing over who is smaller, now competing over who is faster, with lower latency and higher efficiency.

1. What exactly is τ scaling?

τ is the delay/time constant at each layer, divided into four layers:

Transistor: switch speed
Circuit: signal transmission delay
Chip: compute and memory access delay
System: end-to-end communication synchronization time

The goal is to compress τ across the entire stack, with processes, circuits, architecture, and systems optimized using the same set of metrics, instead of working independently.

2. Implementation on mobile devices: LogicFolding

Without upgrading the process, vertically stack the chips, using ultra-precise hybrid bonding to distribute the critical paths across multiple layers, akin to giving the chip "floor levels".

Transistor density: increased from 155 to 238 million/square mm, a 55% boost
Energy efficiency: up 41%, with a nearly 13% increase in clock frequency
SRAM frequency: increased by over 40%
Kirin 2026 clock frequency reaching 3.1GHz, with a target of 4GHz by 2029

3. Implementation in AI data centers: end-to-end latency compression

80% of energy consumption and 70% of costs in AI clusters are due to data transport, with the core focus on compressing communication time.

1. Unified Bus

Eliminating multiple layers of protocol, remote access delay reduced from several tens of microseconds to about 100 nanoseconds, speeding up by 500 times.

2. Hi-ONE Optical Interconnection

Single module 8Tb/s, replacing copper wires with optical fibers, extending the distance from 1 meter to 100 meters, compatible with massive card clusters.

3. 3D Folding

Addressing the issue of 2.5D packaging "growing area quickly, with interfaces unable to keep up," relocating memory, power supply, and optical interfaces to vertical planes, synchronously expanding computing power.

Prediction: By 2035, AI hardware integration will improve by over 100 times

4. Reintegrating logic and memory

In the early years, CPU and memory developed separately. Now in the AI era, data transport is more critical than computation, requiring memory and logic to be tightly integrated in 3D, with industry chain influence shifting towards memory and packaging.

5. Remaining challenges

EDA tools need to adapt to 3D stacking design
Wafer-level process differences and vertical interconnect losses need optimization
New energy efficiency and benchmark standards need to be established

Conclusion

The era of size defined by Moore's Law has ended, and the era of time scaling has begun.

There's no need to stubbornly push the most advanced photolithography equipment; through 3D stacking, system architecture, and interconnect optimization, performance and energy efficiency can continue to be enhanced.

This will be the core route for semiconductors in the next decade.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。