Claude won the championship, the truth behind the 6 major AI grid strategy battles | OKX & AICoin live evaluation

CN
2 hours ago

The first season of the "AI Cryptocurrency Trading Arena" launched by NOF1 finally concluded at 6 AM on November 4, 2025, leaving the cryptocurrency, technology, and finance circles eager for results.

However, the outcome of this "AI IQ public test" was somewhat unexpected. The total principal of $60,000 across six models dwindled to just $43,000 by the end, resulting in an overall loss of about 28%. Among them, Qwen3-Max and DeepSeek v3.1 both turned a profit, with Qwen3-Max emerging as the champion; meanwhile, all four American models suffered losses.

Interestingly, the recent live evaluation of six AI models conducted by OKX and AICoin did not focus on short-term trading but instead concentrated on contract grid strategies. This choice revealed the true performance of the six AI models: in the contract grid strategy, AI achieved "group survival"—all models recorded positive returns. This suggests that AI models may be more suited to neutral, systematic grid strategies rather than short-term speculation.

Among them, Claude took the championship, while Qwen3, which ranked first in the NOF1 event, ended up in last place this time. GPT-5 and Gemini performed relatively steadily, securing second and third places, respectively; DeepSeek and Grok4, despite differing strategy settings, ended up with nearly identical returns.

Why did the same AI models exhibit such a stark contrast in performance across two different tests? What insights can the underlying logic provide for strategies and trading users?

Six AI Grid Strategies Live: Claude Takes the Championship, All Models with Positive Returns

The story background of the "AI Cryptocurrency Trading Arena" is simple: six AI models each held $10,000 in principal and autonomously traded perpetual contracts for BTC, XRP, etc., on the Perp DEX platform over a two-week period (starting around October 18); throughout the process, only market quantitative data was fed to the models, which had to independently decide on long or short positions, leverage, and position sizes, with each decision accompanied by a confidence score.

To this end, we also adopted a minimalist setup: under uniform conditions (each AI invested 1,000 USDT with 5x leverage), the six AI models conducted live tests from October 24 to November 4, 2025. Based on the 1-hour price chart of BTC/USDT on OKX, a set of parameters for an AI grid was provided, including price range, number of grids, direction (long, short, neutral), and mode (arithmetic, geometric).

The results showed that all AI models adopted an arithmetic grid mode and neutral grid strategy, but there were significant differences in the execution of specific parameters such as price range settings and grid density: Grok4 and DeepSeek had the widest ranges (100,000-120,000 USDT), with Grok4 having 50 grids (smaller intervals) and DeepSeek only 20; Gemini's range was 105,000-118,000 USDT, also with 50 grids; GPT-5 had a narrow range of 105,000-115,500 USDT, with the fewest grids (only 10, with the largest intervals); Qwen3 had the narrowest range (108,000-112,000 USDT), with 20 grids.

OKX platform market data indicated that during this period, BTC prices fluctuated between $103,000 and $116,000, initially showing a trend of oscillating upward followed by a sharp decline. This "V-shaped reversal" became a turning point for the six AIs. This precise range is crucial for analysis, as it directly confirms the core difference between this live test and conventional backtesting, explaining why some AI models "failed."

Here are the live performance data:

Live Champion: Claude

Core Strategy: Moderate Range, Moderate Triggers, Balancing Oscillation and Trend Phases, More Stable

Claude won the championship with a cumulative return of +6.18%, with its success hinging on a "moderate width and density" grid strategy. This configuration is considered the gold standard, perfectly suited to the current oscillating BTC market, serving as a reference model for balancing profit and risk control in live trading.

Its grid range was set at 106K–116K, not as aggressive as Qwen3 nor as broad as Grok4. During the oscillating upward phase, it steadily accumulated profits; even during the sharp market drop, the lower limit of 106K effectively controlled the drawdown, outperforming all medium/narrow range models. The moderate range combined with adequate density ensured sufficient grid profits while minimizing unrealized losses during sharp declines.

Specifically, during the price increase phase, Claude avoided the grid idleness that Qwen3 experienced at high levels, steadily accumulating +7.90% profit; during the sharp market drop, when BTC fell to about 103K, Claude's lower limit of 106K only went offline by 3K, allowing the unrealized losses to be effectively buffered by the high accumulated profits, resulting in a drawdown of only 1.72% under 5X leverage, demonstrating excellent risk control capability.

Reliable Alternative: GPT-5

Core Strategy: Wider Range, Low Density, High Single Profit, Diluting Risk with Low Position Sizes

GPT-5 performed steadily, securing second place with a cumulative return of +5.79%, making it a reliable choice just behind Claude. Its strategy is proactive, with a slightly higher risk preference, aiming to seize market opportunities, but its drawdown management is not as effective as Claude's. The profit curve shows a stepwise increase, growing rapidly, but in the later stages (day 10), the drawdown was greater than Claude's. Overall, it is a high-efficiency, profitable strategy that balances returns with moderate risk, though there is still room for improvement in drawdown management.

The core feature of this model's grid strategy is low density and high single profit. Compared to Gemini, although its drawdown reached 2.65%, which is relatively higher, the limited total position size due to fewer grids diluted the risk, while the lower limit of 105K provided a buffer during sharp declines. During the oscillation period, this strategy demonstrated impressive efficiency, with a cumulative return of +8.44%. Compared to Qwen3, GPT-5's lower limit enhances its resilience during price declines. This strategy controls extreme risk exposure by limiting total position sizes, balancing returns and safety, making it a reliable alternative for those seeking efficiency and stability.

The Most Conservative: Grok4

Core Strategy: Widest Range, High Density, Ultimate Defense, Ensuring Safety with Zero Offline Exposure

The Grok4 model represents the ultimate defensive strategy. Compared to Qwen3, it completely abandoned aggressiveness during the oscillation period in exchange for maximum capital safety. The lower limit of 100K ensures zero offline exposure when BTC drops to 103K, and the high-density grid further spreads the position risk, resulting in an absolute drawdown of only 0.97%. Although Grok4 and DeepSeek have similar efficiencies, Grok4's profit curve is the smoothest with the lowest drawdown, making it the most conservative and stable choice, especially suitable for users prioritizing capital safety.

Additionally, there is "DeepSeek with Stable Defense," whose core strategy is—widest range with medium density, prioritizing defense while balancing efficiency and zero offline exposure. And "Gemini with Outstanding Performance," whose core strategy is—wider range with high density, high-frequency micro-profits, spreading risk through broad coverage.

It is worth noting that the DeepSeek model and Grok4 share the same widest range, with nearly identical final returns, validating the logic that "range takes precedence over density": under zero offline defense, the efficiency differences brought by medium density are offset, with range width determining resilience, while density mainly affects the smoothness of the profit curve and trigger frequency.

The Gemini model demonstrated the advantages of high-density strategies in a medium-wide range for improving drawdown resilience: under the same lower limit as GPT-5, the high-density grid widely distributed positions, effectively diluting sharp decline risks, with a drawdown of only 1.41%, significantly better than GPT-5's 2.65%, indicating that high-density strategies can significantly enhance stability and curve smoothness, making them an optimal choice for those seeking stable returns.

Overview of the Advantages and Disadvantages of the Six AI Models' Grid Strategies (Note: Detailed strategy characteristics of Qwen3 will be introduced in the next section):

Under the current set conditions, the AI models achieved "group survival" and recorded positive returns based on a solid logic: in a market dominated by oscillating upward trends, all models successfully utilized the strategy's "volatility equals profit" characteristic to accumulate a sufficient safety profit cushion. Even in the face of extreme risks (sharp declines), this profit cushion was enough to withstand the erosion of unrealized losses, ensuring that all models maintained positive final returns.

"Falling from Grace": Why Did Short-Term Trading Champion Qwen3 End Up in Last Place in Contract Grids?

First, let's review the results of the first season of the "AI Cryptocurrency Trading Arena" launched by NOF1: the Chinese model Qwen3 and DeepSeek both turned a profit, with Qwen3 emerging as the champion; meanwhile, all four American models suffered losses.

This indicates that high-frequency trading often carries higher risks: excessive trading leads to high fees that erode net value, and low win rates are not inherently frightening; the key lies in risk management. It has been proven that even with the emergence of complex AI strategies, simply holding Bitcoin (HODL) can still outperform most models.

One point of interest is the significant contrast in results between the two experiments: Qwen3 overtook DeepSeek to claim the short-term trading championship in the final stages, yet "fell from grace" in the grid strategy, ending up in last place. Why?

In this strategy experiment, Qwen3's performance serves as the "biggest lesson" of this test. It recorded a peak monthly profit of +41.88% and a highest single-day profit of 65.48U during the testing period, but later faced a massive drawdown of 8.12%, resulting in a final cumulative profit of only 22.51U, placing it in last place.

The core of its strategy is: narrow range high-frequency arbitrage, aggressively concentrated, only suitable for central oscillation. During the price increase phase, it perfectly matched the central oscillation with a narrow range, engaging in high-frequency arbitrage, and profits quickly surged to a peak of +10.37%.

However, compared to other models, its lower limit of 108K became the fundamental reason for the collapse: when BTC sharply dropped to about 103K during the decline phase, the 5K USDT offline width left the accumulated long positions completely exposed, and the 5X leverage further amplified the unrealized losses, causing profits to be instantly wiped out, resulting in a drawdown of up to 8.12% on the 10th day, the largest among all models. This clearly demonstrates that while narrow range strategies can quickly profit during oscillation periods, they lack defensive depth and are only suitable for narrow oscillation markets, making them vulnerable to severe damage when prices deviate.

In the previous "AI Cryptocurrency Trading Arena" first season, the core reason Qwen3 won the championship was due to—timely adjustments of the strategy and market adaptation. As market volatility intensified in the later stages, Qwen3 adopted a simple, focused all-in BTC strategy, combined with 5x leverage and precise take-profit and stop-loss measures, efficiently capturing rebound opportunities and achieving explosive net value growth, validating its robustness in dynamically uncertain environments (the ability to maintain stable performance and not easily collapse under different environments and market fluctuations.) and problem-solving capability. In contrast, while DeepSeek's conservative multi-dimensional assessment excelled in risk control (highest Sharpe ratio), its growth was slow, failing to fully capitalize on the BTC-dominated market, while American models like GPT-5's excessive aggressiveness led to overall losses.

In summary: Qwen3's short-term trading championship stemmed from proactive adaptation, while the failure of the grid strategy was due to passive parameter flaws. Therefore, AI trading needs to match market types and avoid a "one-size-fits-all" approach.

The second point of interest is that in the historical market backtest conducted by OKX and AICoin from July 25 to October 25, 2025, none of the six AI models exhibited offline risk in the grid strategy for BTC/USDT perpetual contracts, and their performance was relatively stable. However, in this live test, multiple models experienced offline situations or severe fluctuations in returns. What does this difference indicate?

Seeing "zero offline" in backtesting often provides a false sense of security. This is because the models are too familiar with historical data, essentially being "overfed." However, once in live trading, if the market slightly breaks through historical lows, those strategies without defensive lines will go offline directly. This also illustrates that survival depends not on clever algorithms but on whether the range is wide enough and the defense deep enough. Do not be misled by "perfect backtesting"; truly useful strategies are those that can survive in the worst market conditions.

How to Outperform the Market? Insights from Two Experimental Results

The strategy tool used in this contract grid experiment is the OKX contract grid (AICoin AI grid), and all AIs executed strategies based on this tool, ensuring consistency and fairness in trade execution. This is an automated trading tool that supports various modes such as arithmetic, geometric, neutral, long, and short, allowing customization of price ranges, grid numbers, leverage multiples, and other parameters. It is suitable for capturing small fluctuation profits in oscillating markets through batch building and closing positions for arbitrage.

From this live trading experience, the strategy capability of AI is crucial, but the role of the tool is equally important. Claude's ability to stabilize returns is not only due to good strategy design but also largely benefits from the OKX grid tool, which can automatically buy and sell within the range, while controlling risks, allowing AI not to worry about being caught off guard by a market pullback. Although Qwen3's strategy is more aggressive, the OKX tool helps it protect its capital during high volatility through batch building and automatic take-profit and stop-loss measures, avoiding catastrophic losses. In simple terms, AI is responsible for "how to operate," while the grid tool is responsible for "helping you stabilize and execute according to rules." The combination of the two is much safer than relying solely on AI and makes it easier to see returns.

How to Use AI + Grid Tools More Effectively?

Choose the right grid mode: In a fluctuating market, use "neutral grid" for stability; if the market has a clear direction, try "long or short grid" to follow the trend.

Set reasonable ranges and grid numbers: Too narrow can lead to frequent trading, eating into profits with fees; too wide may miss out on segment profits.

AI provides suggestions, but don’t rely entirely on it: AI can calculate parameters and point directions, but ultimately, you need to judge based on market and tool characteristics.

Backtest first, then go live: The OKX grid tool has a simulation feature, and AICoin has a historical backtesting feature. First, simulate to see the effects, making live operations more reassuring.

High-risk strategies are always the most unstable part of returns. Only by using the right strategy can the potential of AI truly translate into tangible profits. Without risk control, even the smartest AI could lose everything overnight. Therefore, do not blindly chase AI; the market is never lenient, and AI will also pay tuition. It can only be a tool; what truly supports you is risk management. In the next season, we hope to see more mature, stable, and truly risk-aware AI strategies.

Disclaimer

This article is for reference only. It represents the author's views and does not reflect the position of OKX. This article does not intend to provide (i) investment advice or recommendations; (ii) offers or solicitations to buy, sell, or hold digital assets; (iii) financial, accounting, legal, or tax advice. We do not guarantee the accuracy, completeness, or usefulness of such information. Holding digital assets (including stablecoins and NFTs) involves high risks and may fluctuate significantly. Past performance does not guarantee future results, and historical performance does not represent future outcomes. You should carefully consider whether trading or holding digital assets is suitable for you based on your financial situation. Please consult your legal/tax/investment professionals regarding your specific circumstances. You are responsible for understanding and complying with applicable local laws and regulations.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink