Broadcom and Marvell are taking over the underlying narrative of AI data centers.

Author: Godot

Broadcom and Marvell are the dual oligopoly players in the custom ASIC race.

Custom ASICs are one of the fastest-growing tracks in semiconductor development. The main point I want to convey with this content is,

Moore's Law has gradually failed after the 28nm process node, meaning that shrinking chip area does not bring higher transistor density computational power improvement, lower power consumption, or higher frequency conversion speed.

Now at 3nm and 2nm, the design and tape-out costs of a single wafer have exceeded 500 million dollars, and the entire industry's economic structure must inevitably undergo a restructuring.

How will it be restructured?

If you are Google, spending over 50 billion dollars annually on power and depreciation for TPU-related training and inference, a custom chip that can reduce inference token costs by 30% means significant savings.

Over the past five years, hyperscalers have increasingly directed their capital expenditures towards self-developed chips, while the marginal dollar growth directed towards Nvidia's ready-made GPUs has gradually flattened. Google TPU v7, AWS Trainium 2 and Trainium 3, Microsoft Maia 100 and Maia 200, Meta MTIA, and Apple’s AI server chip confirmed for self-development by 2026.

Globally, only two companies, Broadcom and Marvell, can undertake collaborative design work for ASICs at the hyperscaler level. According to Tom’s Hardware's industry chain research, these two companies together occupy about 95% market share of the hyperscaler custom AI accelerator collaborative design market.

A concentration of 95% signifies that in the next five to ten years, every self-developed XPU that hyperscalers invest in will almost certainly pass through one of these two companies.

The Rise of Custom ASICs is Not a Business Story, But an Economic Restructuring Forced by Physical Limitations

High Customer Concentration

First, custom ASIC customers are highly concentrated among leading hyperscalers.

In 1974, Dennard proposed the scaling law at IBM labs, discovering that as chips shrink in size, performance can be improved while maintaining power consumption.

However, by the 90nm node, serious leakage problems caused by physical constants limited the ability to proportionally reduce voltage, leading to a surge in power density. This was the physical reason for CPU frequency growth stopping around 2005 and the starting point for the rise of multi-core architectures.

Starting from 28nm, the cost of a single transistor no longer decreased but began to rise, leading to a sharp increase in chip manufacturing and design costs.

Today, the tape-out cost of 3nm reaches 500 million dollars, while 2nm is closer to 1 billion dollars. Such extremely high fixed costs mean only top data center giants that consume millions of chips annually can spread this cost through the massive shipment volume.

According to TSMC's and industry roadmaps, the process is expected to reach A10, or 1nm node, around 2030, where the physical scaling of transistors will reach its limits, and computational power will rely entirely on packaging, interconnects, and architectural innovations. This represents the biggest structural opportunity for the custom ASIC oligopoly in the next decade.

Failure of Moore's Law Alters Capital Structure

Secondly, the failure of Moore's Law changes the capital structure. In the past, from TSMC N5 to N3 processes, transistor density increased by 1.6 times, while wafer costs only increased by 18%, and the cost of a single transistor decreased by 25%.

Currently, as N3 progresses towards N2, density can only increase by 1.15 times, while wafer costs have skyrocketed by 50% due to complex processes, causing the cost of a single transistor to rise by 30%.

So, counterintuitively, advanced processes no longer make chips cheaper; rather, they use more expensive transistors to complete absolute computational tasks that require top nodes.

Cost-sensitive low-end SoCs, like smartwatches, will still adhere to older nodes like N16/N7, while top AI accelerators with rigid computational demands that can tolerate high premiums must use N3 or even N2.

Broadcom designed the TPU v6e Trillium at the N3 node, TPU v7 Ironwood also at N3, with the next-generation TPU shifting to N2.

The MTIA T-V1 designed for Meta works at N5, while MTIA T-V2 upgrades to N3.

The first self-developed inference chip confirmed for OpenAI is at N3, with the second generation jumping directly to N2.

Apple's server AI chip directly begins at N2.

Marvell's Trainium 2 designed for AWS operates at N5, with Trainium 3 upgrading to N3. MRVL's Maia 100 designed for Microsoft is at N5, while Maia 200 works at N3.

All hyperscalers' next-generation flagship XPUs start at N3 and transition through N2.

This transition window roughly spans 2026 to 2028, aligning with Broadcom's raised guidance for FY27 AI revenue to exceed 100 billion dollars and Marvell's data center revenue from about 8 billion dollars in FY27 to nearly 20 billion dollars by FY29.

Backside Power Supply and High-NA EUV

In the next five years, two important technical routes in the industry are backside power supply and High-NA EUV.

High-NA EUV is the next-generation lithography technology led by ASML. When AI chips shrink to about 1.4nm equivalent, the number of transistors per unit area can increase by more than 1.3 times compared to 2nm, corresponding to further leaps in single-chip computing power.

If this roadblock is delayed, the entire industry will be forced to turn to more aggressive packaging solutions and system-level architecture innovations to enhance computing power.

High-NA EUV could very likely be delayed by 12 to 18 months due to the need for re-adaptation of mask costs, photoresist systems, and metrology tools, which is beneficial for Broadcom and Marvell's chip designs, as well as TSMC.

System-level Integration is Replacing Transistor Scaling as the New Engine of Computing Power Growth

In 2010, packaging costs accounted for about 5% to 8% of the total chip cost. By 2020, this proportion increased to 12% to 15%, and by 2026, for flagship AI accelerators, packaging costs have generally exceeded 30%, with some extreme designs approaching 40%.

The reason is that packaging is becoming the key bottleneck in determining the performance ceiling and supply capabilities of chips.

First, understand the concepts: wafers are raw materials, bare die are semi-finished products, and packaged tested chips are the final products.

First, the limitation of the mask at the physical level restricts the area of a single die to about 858 square millimeters, leading AI chips to shift from increasing single dies to multi-die stitching.

Second, the memory wall issue restricts the number of HBM that a single chip can accommodate, limited by the number of HBM interfaces that can be arranged along the die edges. To continue increasing bandwidth, HBM must be physically close to the logic die, interconnected through wide bandwidth high-speed interfaces.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Broadcom and Marvell are taking over the underlying narrative of AI data centers.

The Rise of Custom ASICs is Not a Business Story, But an Economic Restructuring Forced by Physical Limitations

System-level Integration is Replacing Transistor Scaling as the New Engine of Computing Power Growth

Selected Articles by PANews

Table of Contents

Related Articles