AI Now Matches Prediction Markets in Forecasting Real Events, Study Finds

A new artificial intelligence benchmark launched in August shows that AI models can forecast real-world events as accurately as prediction markets—and sometimes better, according to researchers at the University of Chicago's SIGMA Lab.

Prophet Arena evaluates AI systems by having them predict the outcomes of live, unresolved events drawn from platforms like Kalshi and Polymarket—ranging from election results to sports matches and economic indicators. Unlike traditional benchmarks that test models on historical data with known answers, Prophet Arena tests AI against future predictions.

“By anchoring evaluations in unresolved, real-world events, Prophet Arena ensures a level playing field. There is no pre-training advantage, no secret fine-tuning trick, no leakage of test samples,” the Prophet Arena team said in the benchmark’s official blog post.

The benchmark says it is trying to address a fundamental question about artificial intelligence: “Can AI systems reliably predict the future by connecting the dots across existing real-world information?”

Early results suggest they can. GPT-5 currently leads the leaderboard with a Brier score of 82.21%. Meanwhile, OpenAI's o3-mini model has emerged as the profit champion, generating the highest average returns when its predictions are translated into simulated bets (usually an underdog with enough chances to win can provide a lot more return, given the proper conditions).

DeepSeek R1 appears to be the contrarian AI in the group, frequently making predictions that diverge sharply from both other models and market consensus, so probably not the best model to trust if you want to make a quick buck on Myriad Markets.

The platform reveals distinct "personalities" among AI models when facing identical information. In one example, when predicting whether AI regulation would become federal law before 2026, the market assigned just a 25% probability. But the models diverged wildly: Qwen 3 predicted 75%, GPT-4.1 estimated 60%, while Llama 4 Maverick stayed conservative at 35%.

In another case, o3-mini earned a simulated $9 return on a $1 bet by correctly predicting Toronto FC would beat San Diego FC in a Major League Soccer match. The model gave Toronto a 30% chance of winning, while the market priced it at just 11%. Toronto won.

"(Prophet Arena) tests models' forecasting capability, a high form of intelligence that demands a broad range of capabilities, including understanding existing information and news sources, reasoning under uncertainty, and making time-sensitive predictions about unfolding events," the researchers wrote.

The Prophet Arena also enables human-AI collaboration. Users can supply additional news and context to see how predictions shift, while AI models provide detailed rationales for their forecasts.

As prediction markets themselves integrate AI—Kalshi recently partnered with Elon Musk's Grok, while Polymarket generates AI-powered market summaries—Prophet Arena offers the first systematic comparison of machine forecasting against collective human judgment.

And, if they get really good at it, then machines can be purely factual, with no sentiments or emotions playing a role in the decisions. They could potentially match or exceed the wisdom of crowds, changing the way institutions approach risk assessment, investment decisions, and strategic planning.

The Prophet Arena platform continues updating daily as events resolve, providing an evolving picture of whether artificial intelligence can truly predict the future by connecting today's dots.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

AI Now Matches Prediction Markets in Forecasting Real Events, Study Finds

Selected Articles by Decrypt

Table of Contents

Related Articles