Alpha Arena Reveals AI Trading Flaws: Western Models Lose 80% of Capital in a Week

CN
13 hours ago

The market is the ultimate test for AI.

Written by: Juan Galt

Translated by: AididiaoJP, Foresight News

Can AI trade cryptocurrencies? New York computer engineer and finance professional Jay Azhang is testing this question through Alpha Arena. The project pits the most powerful large language models against each other, each with a capital of $10,000, to see which can make more money in cryptocurrency trading. These models include Grok 4, Claude Sonnet 4.5, Gemini 2.5 pro, ChatGPT 5, Deepseek v3.1, and Qwen3 Max.

Now you might be thinking, "Wow, what a brilliant idea!" and be surprised to learn that at the time of writing, three out of the five AIs are in the red, while the two Chinese open-source models, Qwen3 and Deepseek, are leading.

Indeed, the most powerful proprietary AIs in the Western world, operated by giants like Google and OpenAI, have lost over $8,000 in just over a week, accounting for 80% of their cryptocurrency trading capital, while their Eastern open-source counterparts are in profit.

The most successful trade so far? Qwen3 has maintained profits and continues to earn, solely through a simple 20x long position in Bitcoin. Grok 4, unsurprisingly, has been long on Dogecoin with 10x leverage for most of the competition, once ranking at the top alongside Deepseek, but is now close to a 20% loss. Perhaps Elon Musk should send a Dogecoin meme or something to help Grok out of its predicament.

Meanwhile, Google's Gemini has been ruthlessly bearish, shorting all tradable crypto assets, a stance that echoes its overall cryptocurrency policy over the past 15 years.

In the end, it made every possible wrong trade for an entire week; achieving such poor performance requires skill, especially when Qwen3 is simply going long on Bitcoin. If this is the best level that closed-source AI can offer, then perhaps OpenAI should remain closed-source to spare us from losses.

The New Benchmark for AI

The idea of pitting AI models against each other in the cryptocurrency trading arena carries some profound insights. First, AI cannot obtain answers to the cryptocurrency trading knowledge test during pre-training because it is unpredictable, which is a problem faced by other benchmark tests. In other words, many AI models are provided with answers to some of these tests during training, so they naturally perform well during testing. However, some research indicates that slight modifications to these tests can lead to significant changes in AI benchmark results.

This controversy raises a question: what is the ultimate test of intelligence? According to Grok 4's creator, Iron Man enthusiast Elon Musk, predicting the future is the ultimate measure of intelligence.

And we must admit, there is nothing more uncertain about the future than the short-term price of cryptocurrencies. In Azhang's words, "Our goal with Alpha Arena is to bring benchmark testing closer to the real world, and the market is perfect for that. They are dynamic, adversarial, open-ended, and forever unpredictable. They challenge AI in ways that static benchmark tests cannot. The market is the ultimate test for AI."

This insight about the market is deeply rooted in the libertarian principles that gave birth to Bitcoin. Economists like Murray Rothbard and Milton Friedman pointed out over a hundred years ago that markets are fundamentally unpredictable by central governments, and only when individuals who need to bear losses make real economic decisions can rational economic calculations occur.

In other words, the market is the most difficult thing to predict because it depends on the personal views and decisions of intelligent individuals around the world, making it the best test of intelligence.

Azhang mentions in his project description that instructing AI to trade is not only about profit but also about considering risk-adjusted returns. This risk dimension is crucial because a single bad trade can wipe out all previous returns, as seen in Grok 4's portfolio collapse.

There is also the question of whether these models learn from their experiences trading cryptocurrencies, which is technically not easy to achieve because the cost of pre-training AI models is very high. They can be fine-tuned with their own trading history or that of others, and they may even retain recent trades in short-term memory or context windows, but that only takes them so far. Ultimately, the correct AI trading model may need to learn genuinely from its own experiences, a technology recently announced in academia, but it has a long way to go before becoming a product. MIT refers to them as self-adaptive AI models.

How do we know this is not just luck?

Another analysis of the project and its results so far is that it may be indistinguishable from "random walks." A random walk is akin to rolling dice for each decision. What would that look like on a chart? There is actually a simulator you can use to answer that question; it wouldn't look much different in reality.

The issue of luck in the market has also been described quite carefully by intellectuals like Nassim Nicholas Taleb in his book "Antifragile." He argues that from a statistical perspective, it is entirely normal and possible for a trader, say Qwen3, to be lucky for an entire week! This can lead to it appearing to have exceptional reasoning abilities. Taleb's point goes further; he believes there are enough traders on Wall Street that one of them could easily be lucky for 20 years, building a god-like reputation, while everyone around thinks this trader is a genius until the luck runs out.

Therefore, for Alpha Arena to produce valuable data, it must actually run for a long time, and its patterns and results need to be replicated independently, involving real capital risk, before it can be deemed different from a random walk.

Ultimately, so far, we have seen open-source, cost-effective models like DeepSeek outperform their closed-source counterparts. Alpha Arena has been a great source of entertainment so far, as it has gone viral on X.com over the past week. Its future direction is anyone's guess; we will have to see if the gamble taken by its creators—giving five chatbots $50,000 for cryptocurrency gambling—will ultimately pay off.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink