Video Title: Jensen Huang: – Will Nvidia's moat persist?
Video Author: Dwarkesh Patel
Translation: Peggy, BlockBeats

Editor's Note: While the outside world is still discussing whether "Nvidia's moat comes from the supply chain," this conversation suggests that what is truly difficult to replicate is not the chip itself, but the entire system capability of converting "electrons into tokens"—that is, the collaborative operation across computing architecture, software systems, and developer ecosystems.

This article is a translation of the dialogue between Dwarkesh Patel and Jensen Huang. Dwarkesh Patel is one of the most noticed tech podcast hosts in Silicon Valley, running the YouTube channel Dwarkesh Podcast, known for in-depth research-type interviews, engaging in long-term conversations with AI researchers and core figures in the tech industry.

On the right is Dwarkesh Patel, on the left is Jensen Huang

Around this core, the conversation can be understood from three levels.

First, the changes in technology and industry structure.
Nvidia's advantage lies not just in hardware performance but in the developer ecosystem carried by CUDA and the path dependency formed around the computing stack. In this system, computing power is no longer the only variable; algorithms, system engineering, networks, and energy efficiency collectively determine the speed of AI advancement. This also brings forth an important judgment: software will not be simply "commoditized" because of AI; on the contrary, as agents become prevalent, the use of tools will increase exponentially, thus further amplifying the value of software.

Second, the business boundaries and strategic choices.
Facing the constantly expanding AI industry chain, Nvidia chooses to "do everything necessary but not everything." It does not enter cloud computing nor engage in excessive vertical integration but rather amplifies the overall market size through investment and ecosystem support. This restraint allows it to maintain critical control while avoiding becoming a substitute in the ecosystem, thereby bringing more participants into its technological system.

Third, the divergences regarding technology diffusion and industry patterns.
The most tension-filled parts of the dialogue do not lie in the specific conclusions but in understanding "risk" itself. One viewpoint emphasizes the first-mover advantage brought by leading computing power, while another focuses more on the long-term ownership of ecosystems and standards during the process of technology diffusion. Compared to short-term capability gaps, the more critical question might be: what set of technological systems will future AI models and developers run on?

In other words, the endgame of this competition is not just "who can create a stronger model first," but "who defines the infrastructure upon which the model operates."

In this sense, Nvidia's role is no longer just that of a chip company, but is closer to being a "provider of the underlying operating system" in the AI era—it seeks to ensure that, regardless of how computing capabilities diffuse, the pathways for value generation still revolve around itself.

The original content is as follows (restructured for readability):

TL;DR

·Nvidia's moat is not in "chips" but in the "entire system capability that goes from electrons to tokens." The core is not hardware performance, but the full-stack capability (architecture + software + ecosystem) that turns computation into value.

·The essence of CUDA's advantage is not tools, but the world's largest AI developer ecosystem. Developers, frameworks, and models are all tied to the same technology stack, forming a path dependency that is hard to replace.

·The key to AI competition is not just computing power, but the combination of "computing stacks × algorithms × system engineering." The enhancements brought by the synergy of architecture, networking, energy efficiency, and software far exceed mere process advancements.

·The computing power bottleneck is a short-term issue; supply will be driven to catch up with demand signals within 2–3 years. The real long-term constraint is not chips, but energy and infrastructure.

·AI software will not be commoditized; rather, it will experience an exponential growth in tool usage due to the outbreak of agents. The future will not see software becoming cheaper; rather, the frequency of software calls will soar.

·Not engaging in cloud computing is Nvidia's core strategy: doing "everything necessary" but not swallowing the entire value chain. Amplifying the overall market size through investments and ecosystem support rather than vertical integration.

·The real strategic risk is not competitors acquiring computing power, but the global AI ecosystem no longer being based on the American technology stack. Once the models and developers migrate, long-term technical standards and industry dominance will shift accordingly.

Interview Content

Where is Nvidia's moat: the supply chain or the control from "electrons to tokens"?

Dwarkesh Patel (Host):

We have seen that many software companies have seen their valuations decline because there is an expectation that AI will turn software into a standardized commodity. Another somewhat naive way to understand this is: look, from the design files (GDS2) handed to TSMC, TSMC is responsible for manufacturing logic chips and wafers, constructing switch circuits, then packaging them together with HBM produced by SK Hynix, Micron, and Samsung, and finally sending them to ODMs to assemble into complete machine racks.

Note: HBM (High Bandwidth Memory) is an advanced memory technology specifically designed for high-performance computing and AI; ODM (Original Design Manufacturer) refers to manufacturers responsible for both production and product design.

So, from this perspective, Nvidia is essentially creating software, while manufacturing is done by others. If software becomes commoditized, then Nvidia would also be commoditized.

Jensen Huang (CEO of Nvidia):
But ultimately, there has to be a process to convert electrons into tokens. Converting from electrons to tokens and making these tokens more valuable over time is a transformation that I think is difficult to fully commoditize.

The conversion from electrons to tokens itself is a remarkable process. Making one token more valuable than another token is similar to making one molecule more valuable than another.

In this process, there is much art, engineering, science, and invention that gives the token value.

Clearly, we are observing all of this happening in real-time. So this conversion process, manufacturing process, and the various signals involved are far from being fully understood, and this journey is far from over. Thus, I do not believe that situation will occur.

Certainly, we will make it more efficient. In fact, the way you just described the issue is actually a mental model I have of Nvidia: input is electrons, output is tokens, and this middle part is Nvidia.

Our job is to "do everything necessary while doing as little unnecessary as possible" to achieve this transformation and give it extreme capabilities.

What I mean by "doing as little as possible" is that we will collaborate with others for parts that we do not need to do ourselves, integrating them into our ecosystem. If you look at today's Nvidia, we likely have one of the largest cooperative ecosystems in both upstream and downstream supply chains. From computer manufacturers and application developers to model developers—you can think of AI as a "five-layer cake." And we have ecological layouts at all five levels.

So we try to do as little as possible, but the part we must do is extremely difficult. And I do not think that part will be commoditized.

In fact, I do not believe enterprise software companies are essentially about "tool manufacturing." Yet the reality is, today most software companies indeed are tool providers.
Of course, there are exceptions, some are encoding and solidifying workflow systems, but many companies are essentially tool companies.

For instance, Excel is a tool, PowerPoint is a tool, Cadence makes tools, and Synopsys is a tool as well.

Jensen Huang:
And the trend I see is actually contrary to many people's views. I believe the number of agents will grow exponentially, and the number of tool users will also increase exponentially.

The number of instances using various tools will likely surge as well. For example, the use instances of Synopsys Design Compiler will likely increase significantly.
There will be a large number of agents utilizing floor planners, layout tools, and design rule checking tools.

Today, we are limited by the number of engineers; tomorrow, these engineers will be supported by a large number of agents, allowing us to explore design spaces in unprecedented ways. When you start using today's tools, this change will become very apparent.

The use of tools will drive explosive growth in these software companies. The reason this has not happened yet is that the current agents are still not proficient enough in using tools.

So either these companies build agents themselves, or the agents themselves become capable enough to use these tools. I think in the end, it will be a combination of both.

Dwarkesh Patel
I remember in your latest disclosure, you have nearly $100 billion in procurement commitments for boundary components, memory, packaging, etc. A report from SemiAnalysis indicates that this number may reach $250 billion.

One interpretation is that Nvidia's moat lies in your lock on the supply of these scarce components for many years to come. In other words, others may also make accelerators, but will they be able to get enough memory? Can they obtain enough logic chips?

Is this Nvidia's core advantage in the coming years?

Jensen Huang:
This is something we can do but others find difficult. The reason we can make significant commitments upstream is partly explicit, those procurement commitments you mentioned; the other part is implicit.

For example, many upstream investments are actually made by our supply chain partners because I tell their CEOs: Let me explain how big this industry will become, let me explain why that is, let me simulate it with you, let me tell you what I see.

Through this process—conveying information, inspiring vision, establishing consensus—I align with the CEOs of different industries upstream so that they are willing to make these investments.

Why are they willing to invest in me and not in others? Because they know I have the ability to buy their production capacity and digest it through my downstream. It is precisely because Nvidia's downstream demand and supply chain scale are so large that they are willing to invest upstream.

Look at GTC, the scale of the conference shocks many people. It is essentially a 360-degree AI universe that gathers the entire industry. People come together because they need to see each other. I bring them together so that upstream can see downstream, and downstream can see upstream, while everyone sees the progress in AI.

More importantly, they can access AI-native companies and startups, observing the various innovations occurring, thereby validating the judgments I have made.

Therefore, I spend a lot of time explaining the opportunities at hand to our supply chain and ecological partners, directly or indirectly. Many people say my keynote does not follow the traditional launch model of announcing products one after another, but rather sounds like "teaching". And that is indeed my intention.

I need to ensure that the entire supply chain—whether upstream or downstream—understands: what will happen next, why it will happen, when it will happen, how big it will be, and to be able to reason about these issues as systematically as I do.

So the "moat" you referred to does indeed exist. If this market reaches a trillion-dollar scale in the coming years, we have the capability to build the supply chain to support it. Just like cash flow, the supply chain also has liquidity and turnover. If a business's turnover is not fast enough, no one will create a supply chain for it. Our ability to maintain such scale is due to the extremely strong downstream demand, and everyone can see that.

It is this that allows us to do things at the scale we currently have.

Dwarkesh Patel
I want to understand more specifically whether the upstream can keep up. Over the past many years, your revenue has basically doubled year over year, and the scale of computing power provided to the world is even tripled.

Jensen Huang:
And it continues to double in this volume.

Dwarkesh Patel
Right. So if you look at logic chips, for example, you are one of TSMC's largest customers for the N3 process and also one of the main customers for N2.
According to some analyses, AI may account for 60% of N3 capacity this year and may even reach 86% next year.

Note: N3 refers to TSMC's 3-nanometer (3nm) process node, which can be understood as one of TSMC's current generation of the most advanced chip manufacturing processes.

So with such a large share already occupied, how can you continue to double? Moreover, every year? Are we entering a phase where the growth of AI computing power must slow down due to upstream constraints? Is there a way to circumvent these constraints? How do we manage to build double the wafer fab capacity every year?

Jensen Huang:
At certain moments, instant demand does exceed the supply of the entire industry, both upstream and downstream. Moreover, in some cases, we are even limited by the number of plumbers—which has actually happened.

Dwarkesh Patel:
Then next year's GTC should invite plumbers.

Jensen Huang:
Yes, this is a good phenomenon. You want to be in such a market where instant demand exceeds the total supply of the industry. The reverse is obviously not good.

If the gap between the two is too large, a specific link or component will become a clear bottleneck, and the entire industry will rush to solve it. For example, I have noticed that people are not discussing CoWoS much anymore. The reason is that we have made massive investments and expansions in it over the past two years, increasing it severalfold.

Now I feel that the overall status is fairly good. TSMC has also realized that the supply of CoWoS must keep up with the demand increase for logic chips and memory. So they are expanding CoWoS while also expanding future advanced packaging technologies, and expanding at the same pace as logic chips.

This is very important because in the past, CoWoS and HBM memory were more like "special capabilities," but now they are no longer so. Everyone has realized that they are part of mainstream computing technology.

At the same time, we are also more capable of influencing a larger range of the supply chain. In the past, when the AI revolution was just beginning, I had already made some of the judgments I am stating now five years ago.

At that time, some believed and invested, like Micron's Sanjay team. I still clearly remember that meeting, where I described exactly what would happen in the future, why it would happen, and the predictions for today’s outcomes. They chose to significantly ramp up, and we established a partnership with them. They made investments across various directions such as LPDDR and HBM, which has evidently brought them great returns. Some companies joined later, but now everyone has reached this stage.

So I believe that every generation of technology, every bottleneck, will receive a lot of attention. And now, we are several years ahead in "pre-fetching" these bottlenecks. For example, our cooperation with Lumentum, Coherent, and the entire silicon photonics ecosystem. In the past few years, we have actually reshaped the entire ecosystem and supply chain.

In silicon photonics, we have built a complete supply chain around TSMC, collaborating with them to develop technology, inventing many new technologies, granting these patents to the supply chain to keep the ecosystem open. We prepare the supply chain by inventing new technologies, new workflows, new testing equipment (including dual-side detection, etc.), and investing in related companies to help them scale.

So you can see, we are proactively shaping this ecosystem so that the supply chain can support future scale.

Dwarkesh Patel:
Some bottlenecks sound easier to solve than others. For example, expanding CoWoS is more challenging.

Jensen Huang:
I actually mentioned the hardest example just now.

Dwarkesh Patel:
Which one?

Jensen Huang:
Plumbers. Yes, truly. The example I just referred to is indeed one of the hardest—plumbers and electricians. The reason is that it makes me a bit concerned about some "doomsayers," who constantly talk about jobs disappearing and positions being replaced. If we discourage people from becoming software engineers because of this, then in the future, there will really be a shortage of software engineers.

Similar predictions emerged a decade ago. At that time, someone said: "No matter what you do, do not become a radiologist." You can still find those videos online, claiming radiologists would be the first profession to be eliminated, and the world would no longer need radiologists. But the reality is, we now actually lack radiologists.

Dwarkesh Patel:
Alright, back to the earlier question: some components can expand, and some cannot. Specifically, how do you double the capacity of logic chips? After all, the real bottleneck lies here, with memory and logic being limiting factors. What about EUV lithography machines? How do you manage to double their numbers every year?

Jensen Huang:
These are not impossible tasks. Indeed, rapid scale-up is not easy, but achieving these in two to three years is not particularly difficult. The key is to have clear demand signals. Once you can build one, you can build ten; once you can build ten, you can build a million. Therefore, these things are fundamentally not hard to replicate.

Dwarkesh Patel:
Do you communicate this judgment deep into the supply chain? For example, do you approach ASML and say: If I look three years ahead, to achieve Nvidia's annual revenue of $2 trillion, we need more EUV lithography machines?

Jensen Huang:
Some I will do directly, and some I indirectly promote. If I can convince TSMC, ASML will naturally be convinced as well. So we need to identify the critical bottleneck points. But as long as TSMC believes in this trend, in a few years, you will have enough EUV devices.

What I mean is that no bottleneck will last more than two to three years, none.

At the same time, we are also enhancing computational efficiency. From Hopper to Blackwell, we have improved by about 10 times, 20 times, and in some cases even 30 to 50 times. We are also continuously proposing new algorithms. Because CUDA is flexible enough, we can develop various new methods while expanding capacity and improving efficiency.

So none of these things worry me. What truly concerns me is external factors outside our downstream, such as energy policies. Without energy, you cannot expand; without energy, you cannot establish an industry; without energy, you cannot create a whole new manufacturing system.

Now we want to promote American reindustrialization, to bring chip manufacturing, computer manufacturing, and packaging back to the U.S., while also building new industries like electric vehicles and robotics. As we build AI factories, all of these rely on energy, and the construction cycle related to energy is long. In contrast, increasing chip capacity is a two or three-year issue; increasing CoWoS capacity is also a two or three-year concern.

Dwarkesh Patel:
This is interesting. I feel that some of the guests I’ve interviewed have exactly the opposite views. On this issue, I truly lack enough technical background to judge.

Jensen Huang:
But the good thing is you are currently talking to an expert.

Will Google’s TPU shake Nvidia's position?

Dwarkesh Patel:
Yes, indeed. I want to ask you about your competitors. If we look at TPUs, we can say that two of the top three large models in the world—Claude and Gemini—are trained using TPUs. What does this mean for Nvidia's future?

Note: TPU (Tensor Processing Unit) is a type of specialized chip designed by Google specifically for artificial intelligence (especially deep learning).

Jensen Huang:
What we are doing is completely different. Nvidia is building "accelerated computing," not tensor processing units (TPU).

Accelerated computing can be used for all kinds of tasks, like molecular dynamics, quantum chromodynamics, as well as data processing, data frameworks, structured data, unstructured data, and for fluid dynamics, particle physics, and of course AI. Therefore, the application range of accelerated computing is much broader.

While discussions are currently focusing on AI, and AI is indeed very important and impactful, the range of "computing" itself is much broader. What Nvidia is doing is reinventing the way we compute from general computing into accelerated computing. Our market coverage is far broader than any TPU or other specialized accelerator could cover.

If you look at our positioning, we are the only company that can accelerate various types of applications. We have a vast ecosystem, and various frameworks and algorithms can run on the Nvidia platform. Also, our computer systems are designed for "others to operate." Any operator can buy our systems for use.

Most self-developed systems are not designed for use by others; you have to operate them yourself because they are not flexible enough from the outset to be used by others. Precisely because our systems can be operated by anyone, we have entered all major platforms including Google, Amazon, Azure, OCI, etc.

Whether you are operating the system to lease computing power or for personal use, you must have a large-scale customer ecosystem that spans multiple industries to accommodate these demands. If you are running the system for your own use, we certainly have the capability to help you accomplish this. For example, Elon’s xAI.

Since we enable operators from any industry and any company to use our systems, you can utilize it to build supercomputers for companies like Lilly, for scientific research and drug discovery. We can assist them in operating their own supercomputers and utilizing them for various application scenarios in drug development and biosciences, all of which are fields we can accelerate.

Thus, we can cover a vast array of application scenarios that TPUs cannot. CUDA, developed by Nvidia, can serve as an excellent tensor processing platform, but it is not limited to that; it covers the entire lifecycle of data processing, computation, and AI. Hence, our market opportunities are much larger and our coverage wider. Furthermore, because we currently support virtually all types of applications worldwide, you can deploy Nvidia systems anywhere and be assured customers will use them.

So this is essentially a completely different thing.

Dwarkesh Patel:
This question may be somewhat longer.

Your current revenues are incredibly impressive, and these revenues are not primarily coming from pharmaceuticals or quantum computing. You are not making $60 billion every quarter from these businesses; instead, it’s because AI is an unprecedented technology that is advancing at an unprecedented speed.

So the question is: if we only look at AI, what is the optimal solution? I am not from the foundational layer, but I have spoken with some AI researcher friends who would say: when I use a TPU, it is a large array, very suited for matrix multiplication; whereas GPUs are more flexible and suitable for processing a lot of branching and irregular memory access.

But if you look at AI, isn’t it essentially just repetitive, very predictable matrix multiplication? Therefore, you don't actually need to allocate chip area for features like warp scheduling, thread switching, memory banks, and so on. This means that TPUs are highly optimized for the main application scenarios in the current surge of computing power demand and revenue growth.

How do you view this perspective?

Jensen Huang:
Matrix multiplication is indeed an important part of AI, but it is not all of AI.

If you want to propose a new attention mechanism, or compute in different ways; if you want to design a completely new architecture, like hybrid SSM; if you want to build a model that integrates diffusion and autoregressive—you need a universal programmable architecture, and we can run anything you can think of.

This is our advantage, making the invention of new algorithms much easier. This is the reason why AI is progressing so rapidly, as it is a programmable system, and the continuous invention of new algorithms is why we can advance so quickly.

TPUs, like any other hardware, are also influenced by Moore's Law. We know that Moore's Law brings about a 25% improvement roughly every year. So if you want to achieve a 10x or 100x leap, the only way to do so is to fundamentally change the algorithms and how they compute every year.

This is precisely Nvidia's core advantage.

We can achieve significant improvements with Blackwell compared to Hopper—I previously stated it would be 35 times—when I first announced that Blackwell's efficiency would improve 35 times over Hopper, no one believed it.

Later, Dylan wrote an article stating that I had actually been conservative, as the real improvement was closer to 50 times, and this kind of improvement cannot be achieved solely through Moore's Law. The way we tackle these issues is by introducing new model structures, like MoE, and parallelizing, decoupling, and distributing computations, extending across the entire computing system. If you lack the ability to delve into the fundamentals and develop new compute kernels using CUDA, this is difficult to achieve.

Note: Referring to Dylan Patel, a well-known analyst in the semiconductor and AI infrastructure field and the founder of research institution SemiAnalysis.

So our advantages lie in the programmability of the architecture and Nvidia as a highly collaborative design company. We can even offload some computations to the interconnect architecture, like NVLink, or networking layers, like Spectrum-X. That means we can push changes in processors, systems, interconnects, software libraries, and algorithms simultaneously. All of this is done concurrently. Without CUDA supporting this all, I honestly wouldn’t know where to start.

Dwarkesh Patel:

This also raises a question about Nvidia's customer structure: if 60% of your revenue comes from these top five hyperscalers, in another era, facing different types of customers, like experimental professors, they would be very reliant on CUDA. They can only use PyTorch + CUDA, and everything needs to be optimized.

But if it comes to these hyper-scale cloud vendors, they have the ability to write kernels themselves. In fact, they have to do so to squeeze out that last 5% of performance. Anthropic and Google often train using their self-developed accelerators or TPUs. Even OpenAI, when using GPUs, relies on Triton; they will say: We need our own kernels. So they will directly write in CUDA C++, rather than using libraries like cuBLAS, NCCL, and build their own software stack to even compile to other accelerators.

Thus, for most of your customers, they can indeed replace CUDA. To what extent does CUDA still fundamentally drive cutting-edge AI that must depend on Nvidia?

Jensen Huang:
First of all, CUDA is a very rich ecosystem. If you want to develop on any computer, starting with CUDA is a very sensible choice because this ecology is very robust, supporting all mainstream frameworks.

If you need to write custom kernels, like Triton, we have contributed a lot of Nvidia’s technology to the backend of Triton, and we are very eager to help various frameworks improve. Now there are many frameworks, like Triton, vLLM, SG Lang, and many more.

With the development of post-training and reinforcement learning, this field is rapidly expanding. For instance, you have Vairal, NeMo RL, and a range of new frameworks. If you are to develop on a certain architecture, starting with CUDA is the most logical choice because you know the ecology is mature. When problems arise, it is more likely that your own code is at fault rather than the vast amount of code in the underlying platform.

Don't forget, the scale of code involved behind these systems is enormous. When systems malfunction, you want to discern whether the issue lies with you or the computing platform itself.

You would certainly prefer it to be your own problem rather than with the computing platform. Of course, we have our share of bugs, but our systems have become very mature; you can at least continue to build on a reliable basis.

The second point is the scale of the install base. If you are a developer, no matter what you are doing, one crucial thing is the "install base." You want your software to run on as many computers as possible. You aren't writing software for yourself; you are writing software for your entire cluster, or even for the whole industry, as you are a framework developer.

Nvidia's CUDA ecosystem is fundamentally our most important asset. Currently, there are hundreds of millions of GPUs worldwide. All cloud vendors have them, from V100, A100, H100, H200, to L series, P series, with various specifications.

And they exist in many different forms. If you are a robotics company, you would want CUDA to run directly on the robot body. We are basically everywhere.

This means that once you develop software or models, they can be used anywhere. So, the value of this install base itself is incredibly large.

Lastly, the flexibility of the deployment location. We are present on all cloud platforms, which makes us unique. As an AI company or developer, you are uncertain about which cloud provider you will ultimately work with and where your system will run. And we can operate everywhere, including on-prem.

Therefore, the richness of the ecosystem, the scale of the install base, and the flexibility of deployment locations combined are extremely valuable.

Dwarkesh Patel:
This makes sense. But I am curious whether these advantages are really that important to your major clients. Indeed, many would benefit from these advantages, but those who can build their software stacks—essentially those clients who contribute most of your revenue—especially in a world where AI is becoming increasingly strong in "verifiable feedback loop" tasks, such as reinforcement learning scenarios, kernel optimizations like attention or MLP are actually a very easy-to-verify feedback loop.

So, for these hyperscale cloud vendors, can they completely write these kernels themselves? Of course, they may still choose Nvidia for cost-effectiveness. But the question is, will this ultimately become a simple comparison: who can provide better specifications? For example, at unit cost, who can deliver higher computing power (FLOPs) and higher memory bandwidth? Because in the past, Nvidia had extremely high profit margins (over 70%) on both hardware and software, largely due to the CUDA moat.

The question is, if most clients can build their software stack themselves without relying on CUDA, can such profit margins be maintained?

Jensen Huang:
The number of engineers we invest in these AI labs is astonishing, collaborating with them, helping them optimize the entire technology stack. The reason is that no one understands our architecture better than we do. And these architectures aren't generic like CPUs.

CPUs are somewhat like a "family car"; you can think of them as a cruising car that doesn't go particularly fast, but everyone can drive it well, has cruise control, and everything is simple. But Nvidia's GPU accelerators are more like F1 racing cars. I can imagine everyone can drive it at 100 miles per hour, but to really push it to the limit requires quite a professional capability.

And we use a lot of AI to generate these kernels. I am very confident that for a considerable period of time, we will remain indispensable. Our expertise can help our partners in these AI labs easily double their performance. Many times, after we optimize their technology stack or a specific kernel, their models can accelerate by 3 times, 2 times, or even 50%. This is a significant improvement, especially when considering they have large clusters of Hopper and Blackwell.

If you double the performance, that directly translates to doubling the revenue. This is directly correlated to income. Nvidia’s compute stack has the optimal TCO (Total Cost of Ownership) globally, with no competitors. No company can prove to me that any platform performs better than ours in terms of performance/TCO. Not a single one. And these benchmark tests are publicly available.

As Dylan said, Inference Max is public, anyone can use it. But no TPU team wants to use it to demonstrate their inference cost advantage. It is hard to execute, and no one is willing to prove it.

MLPerf is the same. I welcome them to showcase that 40% advantage they have claimed. I would love to see them prove the cost advantage of TPUs. To me, this makes no sense; fundamentally it doesn't hold water. It makes no sense whatsoever.

So I believe the reason we are successful is fundamentally because our TCO is very strong.

Another point is that while you mentioned that 60% of our clients come from the top five vendors, most of that business is actually directed towards external clients. For instance, on AWS, most of the computing power Nvidia provides is available for external customers, rather than used by AWS itself. On Azure, our clients are essentially all external clients; the same on OCI. They choose us because our coverage is extremely broad.

We can bring the world's best customers to them, and these clients are built on the Nvidia platform. The reason these companies are built on Nvidia is because our scope and flexibility are very robust.

Thus, I believe this flywheel is at work: the install base, the programmability of the architecture, and the continuous accumulation of ecosystems. Along with the fact that there are now thousands of AI companies globally. If you are one of those AI startups, which architecture will you choose? You’ll choose the most prevalent, the one with the largest install base, and the richest ecosystem. That’s the logic of this flywheel.

So the reason is:

·First, our unit cost performance (performance per dollar) is very high, making token costs the lowest;

·Second, our unit power performance (performance per watt) is the highest globally; if partners build a 1GW data center, it must output the most tokens, generating the most revenue. And our architecture can yield the most tokens per unit of power.

·Third, if your goal is to lease computing power, we have the biggest customer base in the world.

This is the reason this flywheel takes shape.

Dwarkesh Patel:
This is very interesting. I feel the core of the issue lies in what the market structure looks like. Even with many companies, it is entirely possible to find a situation where there are thousands of AI companies, evenly splitting the computing power.

However, if the reality is that, through these hyperscale cloud vendors, the actual users of computing power are foundational model companies like Anthropic and OpenAI, which have the capability to run different accelerators.

Jensen Huang:
I believe your premise is incorrect.

Dwarkesh Patel:
Maybe. So let me rephrase the question: if all these claims about performance and costs are valid, then why did a company like Anthropic just announce a multi-gigawatt-level TPU collaboration with Broadcom and Google a few days ago? Additionally, most of their computing power originates from these systems. For Google, TPU itself is a primary source of computing power. Therefore, if we look at these large AI companies, they used to be entirely reliant on Nvidia, but now that is no longer the case.

If these advantages theoretically hold, why would they still opt for other accelerators?

Jensen Huang:
Anthropic is a rather unique example. Without Anthropic, the growth of TPUs wouldn't exist. The growth of TPUs almost entirely stems from Anthropic. Similarly, without Anthropic, the growth of training demand wouldn't exist either.

This is a very clear fact. There aren’t numerous similar opportunities; in reality, there is only one Anthropic.

Dwarkesh Patel:
But OpenAI also collaborates with AMD, and they are developing their own Titan accelerator.

Note: AMD (Advanced Micro Devices) is a US semiconductor company that designs computing chips and is a major competitor to Nvidia and Intel.

Jensen Huang:
But the vast majority of their operations still rely on Nvidia. We will also continue to collaborate extensively. I am not unhappy that others are trying different solutions. If they don't explore other options, how would they know how good our solution is?

Sometimes it is indeed necessary to reaffirm this through comparisons. And we must continually prove that we deserve our current position.

There have always been all kinds of assertions in the market. You can see how many ASIC projects have been canceled. Just because you start making ASICs does not mean you can create something better than Nvidia.

In fact, this is not easy. You could even say, rationally, this doesn’t quite hold. Unless Nvidia genuinely makes significant errors in certain areas. But considering our scale and our speed—we are the only company globally that achieves substantial leaps every year.

Dwarkesh Patel:
Their logic is: they don’t need to be better than Nvidia; they just need to not be 70% worse than Nvidia because they believe your profit margin is 70%.

Jensen Huang:
But don’t forget, even ASICs have very high profit margins. Nvidia’s profit margins are about 60%–70%, while ASICs' margins may also be around 65%. So how much are you really saving?

You always have to pay one side. So from what I see, these foundational (ASIC) businesses also have very high profit margins, and they think so too, and take pride in it.

A long time ago, we genuinely lacked the ability to do this. And to be honest, I didn’t fully comprehend how difficult it is to construct a foundational model lab like OpenAI or Anthropic. I did not fully realize they indeed needed massive investment support from the supply side.

At that time, we lacked the capability to make those multi-billion dollar investments, like investing in Anthropic to let them use our computing power. But Google and AWS could; they投入巨资了作为回报，Anthropic选择使用他们的算力。

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Huang Renxun's latest podcast: Can Nvidia's moat continue?

TL;DR

Interview Content

Where is Nvidia's moat: the supply chain or the control from "electrons to tokens"?

Will Google’s TPU shake Nvidia's position?

Selected Articles by 律动BlockBeats

Table of Contents

Related Articles