RamenPanda|5月 25, 2026 03:04
Breaking the 'memory wall': Optical interconnect technology appears in GPU-HBM package
As a solution to one of the long-term challenges in the AI semiconductor field - the "memory wall" problem, the storage and packaging industries at home and abroad are weighing a solution that decouples GPUs from high bandwidth memory (HBM) and packages them separately. The core idea is to move the HBM that has been installed closely next to the GPU by a certain distance, and then use light (optics) to bridge this gap, so that several times more HBM can be installed than today.
On the 22nd, a researcher from a large domestic storage manufacturer stated: "We are currently facing difficulties in expanding HBM bandwidth and capacity, so we are exploring a solution with our customers to break through the GPU's shoreline limitations through optical interconnection and load more HBM. The shoreline refers to the length around the chip.
In today's AI computing environment, the key factor that hinders computing efficiency is the data transfer speed of the storage chip. Although GPU performance has skyrocketed with each generation, the speed of memory storage and data supply has not kept up with the pace, creating a structural performance barrier known as the memory wall. The emergence of HBM with a wide data channel has extinguished the fire in front of us, but critics continue to point out that its bandwidth and transmission speed are still insufficient to cope with the explosive growth of AI computing power.
So far, the industry has been focused on increasing the HBM stack by stack in order to improve memory capacity and bandwidth within a limited footprint. But as the number of stacked layers increases from 12 to 20 or even higher, the difficulty of the process increases exponentially. Technology has reached its physical limits, including the increasing difficulty of meeting fixed height specifications. Vertical stacking has reached a turning point - to the point where JEDEC standards organization has relaxed its HBM height specifications.
The bigger problem is that if the stacking layers cannot be further increased, an alternative solution is to horizontally add more HBM around the GPU - but this is also not feasible. In the current 2.5D packaging structure, GPU and HBM are installed closely adjacent to each other on the same substrate. In this structure, the number of HBMs that can be placed is strictly constrained by the finite value of the GPU chip's peripheral length, namely the shoreline. Even if we want to place more HBMs, there is nowhere to physically place them, leading to a structural stalemate in the industry.
The emerging alternative solution in the semiconductor industry today is to separate GPU and HBM and package them separately. It subverts the traditional chip design principle that components must be close to each other to minimize data transmission time. This scheme no longer allows two chips to be adjacent, but instead distances them apart and connects them with an overwhelmingly fast optical signal to overcome the increased physical distance.
Moving HBM slightly away from the GPU within the circuit board frees the design from GPU shoreline constraints. Once the spatial constraints disappear, much more HBMs than today can be horizontally spread out and stuffed into the board - several times more than today - without pushing the stacking height to the limit. This means that the total memory capacity and data bandwidth of AI accelerator systems will rapidly expand, and their scale will not be comparable to existing systems.
We are currently discussing placing HBM under the GPU... The external specifications may change
The industry has proposed a series of architectural design solutions on where to place HBM on GPU circuit boards.
The storage researcher stated, 'The options under discussion range from extensively utilizing the space adjacent to the GPU periphery to isolating and placing HBM underneath the GPU circuit board.'. He added, 'In the latter scenario - where it is isolated and placed under the GPU circuit board - the motherboard will have to extend in the length direction, so we are discussing the possibility of changing the overall external specifications with GPU manufacturers.'. Specifically, HBM may surround the GPU from a few centimeters away, or it may create a separate HBM area in the center of the circuit board.
He said, "We are discussing the optimal layout and keeping every possibility open." He also said, "Currently, no plan has been determined as a formal roadmap, but as part of the preliminary research for the next generation of AI accelerators, we are in talks with our partners. ”
The outsourced semiconductor packaging and testing (OSAT) industry is also closely monitoring this trend. A senior executive from a global OSAT company stated, "Optical interconnection is a clear development trajectory, the only issue is timing." He predicted that "rack to rack and server to server connections will be the first to become optical, followed closely by chip to chip connections within the board. He added, "Larger units will be connected by light first, but optical research is advancing so quickly that this day may not be far away
Technically speaking, the optical interconnect technology that connects GPU and HBM shares the same underlying principles as the technology that connects servers within data centers. The difference lies in the fact that there is a high technical threshold to reduce the optical conversion technology used for communication between large devices to the microscopic scale of a single circuit board and chipset.
A senior executive from a domestic developer of co packaged optical (CPO) components explained, "As the HBM stack height approaches its limit, the industry is discussing the horizontal deployment of memory to maximize the physically installable quantity. He added, "The principle of HBM optical links is the same as traditional data center optical interconnects, but they must operate within limited board space, requiring optical components to be miniaturized to much smaller sizes and achieving much higher integration densities - thus making the technical difficulty even greater
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink