AI Prediction Record: Want to make money in the prediction market with AI? But it may not even have understood the question clearly.

Original | Odaily Planet Daily (@OdailyChina)

Author | Nan Zhi (@AssassinMalvo)_

After most sectors have been disproven, the prediction market has become one of the few sectors in the Crypto circle that is still experiencing positive growth. On November 20, Nan Zhi began to explore the smart money in the prediction market using the approach of finding Meme smart money from last year, and achieved relatively good results in the initial phase.

At the beginning of December, with the launch of Gemini 3 Pro, the idea arose to analyze and predict the prediction market using AI while testing related models, and to see which side—humans or AI—could make more accurate predictions.

When introducing the prediction market, it is often claimed that it pushes the market towards the "truth" by allowing insightful individuals to bet real money. However, some believe that Crypto + prediction markets allow "insiders" to safely profit from information asymmetry, thus driving the market towards "insider results." This essentially represents a clash between the views of "collective wisdom" and "truth held by a few," with AI predictions leaning more towards "collective wisdom," which requires a large amount of available knowledge and insights.

Therefore, in selecting AI models, Gemini and Grok were initially chosen because both rely on Google and X platforms, allowing for direct access to a wealth of knowledge and insights. Recently, Nan Zhi has also added a combination of "Doubao + Douyin knowledge," but since there are not many prediction topics yet, this will not be covered in this article.

Basic Rules

AI Version: Gemini 2.5 pro (with Google search), Grok 4 Fast (called via OpenRouter, enabling native search function)
Topic Selection: Topics for betting are chosen by humans, with AI following the predictions, but excluding the Crypto sector
Input Content: Official topic (title), official description (Description), optional answers (actually only Yes and No)

Note: Polymarket's topics are divided into major categories Event and subcategories Market. Major category Event includes broad topics like "Who will be the next Federal Reserve Chair?" and "When will Strategy sell Bitcoin?" Each Event contains N subcategory markets, such as "Will Hasset become the next Federal Reserve Chair?" and "Will Strategy sell Bitcoin before March 31, 2026?" To align with human predictions, Market was chosen as the topic for AI judgment, without inputting other options, such as only asking it to judge "Will Hasset become the next Federal Reserve Chair?" rather than selecting the most likely candidate from N options.

Prompt Design:
Require AI to search for the latest news, official announcements, and expert analysis reports
Require exclusion and prohibition of using prediction market data
Make judgments based on "evidence" and use logical reasoning
Only allow output of Yes and No, with a paragraph explaining the reasoning logic

Current Results

Among the prediction topics, 21 have been settled, with Grok having the highest win rate at 75%, humans at 66.7%, and Gemini at the lowest at 52.4%. Current results can be viewed on the related website.

What Mistakes Did AI Make?

Gemini Occasionally Misjudges the Current Time

In the question "Will Trump's approval rating hit 35% in 2025?", Gemini stated that it is currently the first half of 2025, so anything is possible, and gave a random answer.

However, when the author directly asked Gemini to output the current time using a program, Gemini was able to provide the correct answer, and it is unclear why such a mistake in time perception occurred.

Insufficient Depth of AI Thinking

In the question "Will Gemini 3.0 Flash be released by December 16?", Grok stated, "The official has recently only mentioned Gemini 3 Pro and 2.5 related versions, and rarely mentioned 3 Flash, so there is insufficient evidence to make a judgment," only considering current information.

In contrast, Gemini pointed out, "Gemini 1.0 was released in December 2023, and the experimental version of Gemini 2.0 Flash will be launched in December 2024. Following this pattern, it is logical to release version 3.0 by the end of 2025," and discovered "a leaked demonstration of 'Gemini 3.0 Flash' circulating in online communities recently (December 14, 2025), further enhancing the likelihood of its upcoming public release."

Although Gemini's conclusion turned out to be incorrect, it is evident that there is a significant gap in the breadth of information relied upon by the two.

AI Infers Based on Common Sense Rather Than Evidence + Logic

In the question "Will Trump's approval go Up or Down this week?", Gemini stated, "Predicting a single week's approval rating over a year later has a high degree of uncertainty," once again showing a "time misjudgment." Then Gemini stated, "In any ordinary week, the probability of events leading to a slight decline in approval ratings may be slightly higher than the probability of positive events that could significantly boost approval ratings," thus concluding that the likelihood of a decline is greater, generating a conclusion based solely on subjective common sense assumptions.

In this question, Grok based its response on news reports and polling data regarding "government shutdowns, economic concerns, immigration policy controversies, and negative backlash from comments on the death of Rob Reiner," which aligns with the design expectations.

Incorrect Judgment of Settlement Conditions

In the question "Will Trump release the Epstein files by December 20?", both Gemini and Grok were aware that "the government will release 'hundreds of thousands of pages' of documents on Friday (December 19)," while the settlement conditions clearly state, "If the government publicly releases any documents related to Epstein's illegal activities that have not been disclosed before the listed date, it will be judged as Yes."

However, under this condition, Gemini stated, "It is impossible to complete the release of 'all' documents before December 20," clearly misjudging the conditions required for settlement, thus providing an incorrect answer.

Summary

In summary, Grok's prediction win rate has already surpassed that of those who have made hundreds of thousands or millions of dollars in the prediction market, but a deeper exploration of its prediction logic reveals many areas that can be guided and corrected.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。