AI Is Now Way Better at Predicting Startup Success Than VCs

Could GPT-4 have spotted Airbnb in 2008—or Figma in 2012—before the pros did?

A new paper from researchers at the University of Oxford and Vela Research suggests that large language models are already better at picking winners than most early-stage investors. In a field notorious for pattern-matching and warm intros, the prospect of AI surfacing promising founders earlier—without knowing their names—could be a game-changer.

If models like GPT-4o can even modestly improve hit rates, then they could become must-have tools in every firm’s deal-sourcing stack, and might even make startup investing a little more meritocratic.

The research paper, “VCBench: Benchmarking LLMs in Venture Capital,” introduces VCBench, the first open benchmark designed to test whether AI can forecast startup success before it happens. The team built a dataset of 9,000 anonymized founder profiles, each paired with early-stage company data. About 810 profiles were labeled as “successful”—defined as achieving a major growth milestone like an exit or IPO—giving the models a sparse but meaningful signal to train on.

Crucially, the researchers scrubbed the dataset of names and direct identifiers so the models couldn’t simply memorize Crunchbase trivia. They even ran adversarial tests to ensure that LLMs weren’t cheating by re-identifying founders from public data, reducing re-identification risk by 92 percent while preserving the predictive features.

When put to the test, the models did better than most human benchmarks. The paper notes that the “market index”—essentially the baseline performance of all early-stage VC bets—achieves just 1.9% precision, or one winner in 50 tries. Y Combinator does better at 3.2%, roughly 1.7 times the market, and tier-1 VC firms hit about 5.6%, roughly doubling that again.

Large language models, however, blew past this baseline.

For instance, DeepSeek-V3 delivered more than six times the precision of the market index, while GPT-4o topped the leaderboard with the highest F0.5 score, balancing precision and recall. Claude 3.5 Sonnet and Gemini 1.5 Pro also beat the market handily, landing in the same performance tier as elite venture firms.

In other words, nearly every frontier LLM tested did a better job of identifying likely winners than the average VC—and several models matched or exceeded the predictive power of Y Combinator and top-tier funds.

The researchers have released VCBench as a public resource at vcbench.com, inviting the community to run their own models and publish results. If the leaderboard fills with LLMs outperforming the market, then it could reshape early-stage investing. A world where founders are discovered by AI agents trawling LinkedIn rather than cold-emailing partners might not be far off.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

AI Is Now Way Better at Predicting Startup Success Than VCs

Selected Articles by Decrypt

Table of Contents

Related Articles