Predicting the World Cup knockout stage, is there such a big difference in AI levels?

Original | Odaily Planet Daily (@OdailyChina)

Author | Asher (@Asher_0210)

Before each match of the World Cup, I always let AI predict the outcome. Almost every model speaks convincingly, full of details.

Some analyze the teams' market values, some break down the group stage data, some analyze injuries and tactics, while others directly give the scores, overtime, and penalty shootout scripts. At a glance, ChatGPT, Grok, Qianwen, DeepSeek, Gemini, and Claude all seem to understand the game well.

But as a user of prediction markets, what I truly care about is not which model is more complete in its analysis, but which one is more worth referencing.

As the World Cup enters the knockout stage, Odaily Planet Daily has started posing the same questions to different AI models before each match from the very first match and then comparing the actual results after the matches — which models merely seem to analyze correctly, and which models really captured the trajectory of the matches in advance.

So far, in the ongoing World Cup knockout matches, Canada won 1:0 against South Africa, Brazil narrowly defeated Japan 2:1, Germany was eliminated by Paraguay after a penalty shootout, and the Netherlands also fell to Morocco in penalties. In the match between Belgium and Senegal, it ended in a 2:2 draw, then reversed in extra time, completely maximizing the uncertainty of the knockout stage.

DeepSeek and Gemini, achieving glory by predicting the Morocco match

The most memorable predictions so far are from DeepSeek and Gemini regarding the Netherlands vs. Morocco match. It was actually easy to misjudge which team to support before this match — the Netherlands had stronger paper strength and a more complete lineup, and many models acknowledged that Morocco was tough to play against but ultimately believed that the Netherlands would prevail.

What is impressive about DeepSeek and Gemini is that they did not just stop at “this match will be tense,” but also wrote out the subsequent script. Gemini directly predicted a draw of 1:1 after regular time, with Morocco winning the penalty shootout. The result was indeed a 1:1 draw, and Morocco won 3:2 in the shootout, not only guessing the direction correctly but also predicting how the game would go to penalties and eventually who would come out on top.

Gemini's prediction for the Netherlands vs. Morocco match

DeepSeek was also very close. It estimated that the regular time would likely be 1:1 or 0:0, anticipating that the match might drag on to extra time or even penalties, and leaned towards Morocco advancing by relying on defense and counterattacks.

DeepSeek's prediction for the Netherlands vs. Morocco match

After this match, the presence of DeepSeek and Gemini was significantly elevated. Especially Gemini, which this time seemed less like making a pre-match prediction and more like having seen the script of the match beforehand.

Grok and Qianwen consistently hit specific scores, showing more stability than expected

Besides DeepSeek and Gemini shining in the Morocco match, Grok and Qianwen also made their presence felt. Their most notable feature is that in some matches with clear outcomes, they not only accurately judged the advancing team but also predicted specific scores close to the final results.

The match between South Africa and Canada is one example. Before the match, most AI models favored Canada, but the disagreement was whether Canada would win comfortably. Grok predicted a 1:0 victory for Canada, and Qianwen also estimated a narrow win by one goal. In the end, Canada indeed advanced with just one goal.

Qianwen predicted the match between South Africa and Canada

The match between Brazil and Japan was similar. Most AI models felt Brazil was stronger, but whether Japan could keep up was the key to this match. Grok and Qianwen both predicted a 2:1 scoreline, which turned out to be accurate as Brazil narrowly won 2:1. They correctly identified that it wasn't simply “Brazil will win,” but that Japan could indeed cause enough trouble for Brazil.

In the match between Ivory Coast and Norway, both were pretty accurate as well. Norway has Haaland, making their path to advance understandable, but Ivory Coast's physical play and wing attacks would not allow the match to be one-sided. Grok and Qianwen both predicted Norway would win 2:1, and the final score indeed fell within this "script."

Grok predicted the match between Ivory Coast and Norway

The advantage of Grok and Qianwen is their nuanced understanding of popular matches. They did not predict a major script like Morocco eliminating the Netherlands, but for matches involving Canada, Brazil, Norway, and France, they provided relatively accurate directions and score estimates. In other words, they may not always catch underdog wins, but they excel at judging whether popular teams will dominate or win narrowly.

ChatGPT lacks remarkable scores but analyzes match processes accurately

ChatGPT did not predict Morocco's penalty elimination of the Netherlands like Gemini nor consistently hit specific scores like Grok and Qianwen. However, its advantage lies in pointing out that many matches appear to favor strong teams, ChatGPT often reminds that these matches might not be that straightforward.

The match between Brazil and Japan serves as an example. ChatGPT predicted Brazil would advance but did not paint a picture of Brazil crushing Japan. Instead, it noted Japan's pressure, movement, and discipline could make Brazil uncomfortable, even having a chance to score first or equalize. The match between Ivory Coast and Norway was similar; ChatGPT predicted Norway would advance but mentioned that it wouldn't be an easy match, as Ivory Coast's physical play, wing attacks, and transition capabilities could create difficulties.

Additionally, in the knockout match between England and the Democratic Republic of the Congo, ChatGPT did not simply state that England would win big but suggested that the match might be quite dull and that the Democratic Republic of the Congo would use low block defense to disrupt the pace. In the end, England advanced but did not do so comfortably.

ChatGPT predicted the match between England and the Democratic Republic of the Congo

ChatGPT's strength lies in not predicting scores accurately every time, but often being able to articulate where the challenges in the match lie. It is suitable for understanding matches but may not be ideal for predictions only based on the final scores. It can describe the process accurately, but when it comes to predicting major upsets, it still lacks a bit of decisiveness.

Germany's elimination became a collective blunder for AI models

If earlier matches showed the highlights of different models, then the match between Germany and Paraguay represents a collective failure.

Before the match, all AI models leaned towards Germany. ChatGPT, Grok, Qianwen, Gemini, and Claude all sided with Germany, with score predictions mostly hovering around 2:0, 3:0, or 3:1. The reasoning was consistent among them: all believed Germany had the stronger paper strength, better squad depth, and greater attacking power.

However, the outcome revealed problems in this match. The AI models underestimated Paraguay's ability to drag the game into a muddy situation; Germany failed to settle the match in regular time or break the deadlock in extra time, ultimately being dragged into a penalty shootout and eliminated.

Who is currently the most accurate?

From the ongoing knockout matches that have concluded, the characteristics of different models have begun to emerge.

DeepSeek and Gemini shine the brightest. They can not only predict teams like Brazil and France advancing, but also provide valuable answers in more difficult upset matches. In the match between the Netherlands and Morocco, their key advantage was their courage to predict Morocco's upset and the penalty shootout scenario beforehand. Notably, Gemini's direct prediction of Morocco advancing on penalties was indeed remarkable.

Grok and Qianwen are more like "score-oriented players." They hit quite a few specific scores, especially in matches involving Canada, Brazil, Norway, and France. The issue is that when facing traditional powerhouses like Germany and the Netherlands, they still leaned towards the favorites.

ChatGPT and Claude resemble "analysis-oriented players." They provide comprehensive reasoning, and most of their directional insights are reasonable, and they can also alert about some overtime risks. However, the problem is that they often see matches that are difficult to play but are hesitant to write conclusions leaning towards upsets. The match between the Netherlands and Morocco exemplifies this, where they clearly recognized the risks of overtime and penalties but still leaned towards believing the Netherlands would win.

Therefore, rather than rushing to ask which model understands the game best, it’s more beneficial to observe which scenarios they are best suited for.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Predicting the World Cup knockout stage, is there such a big difference in AI levels?

DeepSeek and Gemini, achieving glory by predicting the Morocco match

Grok and Qianwen consistently hit specific scores, showing more stability than expected

ChatGPT lacks remarkable scores but analyzes match processes accurately

Germany's elimination became a collective blunder for AI models

Who is currently the most accurate?

Selected Articles by Odaily星球日报

Table of Contents

Related Articles