Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game

CN
Decrypt
Follow
3 hours ago

Inception Labs introduced Mercury 2 on Thursday, calling it the world's fastest reasoning language model. Per the company's announcement, it generates about 1,000 tokens per second—the chunks of text an AI model reads and writes—against roughly 89 tokens per second for Anthropic’s Claude Haiku 4.5 Reasoning and 71 for OpenAI’s GPT-5 Mini.


That puts it in the same speed bracket Google would later claim for DiffusionGemma.



Both models get there by dropping the typewriter approach to writing. A standard chatbot writes one word, checks what it just wrote, then writes the next, looping until the answer is finished. Diffusion models instead fill a block of text with random placeholder tokens and erase the noise across a handful of parallel passes—the same trick that turns static into a photo in image generators like Stable Diffusion—until the whole block locks into a finished response at once.


Where the two diverge is what survives that process. On AIME 2026—built from real American Invitational Mathematics Examination problems and scored as the percentage solved correctly—Mercury 2 hit 90%. Google tested DiffusionGemma on the same set, where it scored 69.1%, while standard, non-diffusion Gemma 4 scored 88.3% on the same test.


On GPQA, a PhD-level science benchmark scored the same way, the two models nearly tie: Mercury 2 at 77% against DiffusionGemma's 73.2%. But Google's own developer guide recommends standard Gemma 4 for applications that demand maximum quality, conceding DiffusionGemma trails it across the board.





The speed claim holds up outside the lab, too. Augment Code, an AI coding-agent company, swapped Mercury 2 in for Anthropic's Claude Opus 4.7 on its context-compaction subagent and saw an 82% drop in latency and a 90% cut in cost, while reporting the same output quality, according to a joint case study.


Inception was built on research from its founder Stefano Ermon, a Stanford professor who co-authored some of the score-based diffusion techniques that power today's image generators. The startup's $50 million funding round drew backing from Nvidia's venture arm and individual investors Andrew Ng and Andrej Karpathy.


For non-technical users, the big thing most people don't notice until they feel it is the "flow." Traditional models make you wait between thoughts in a long session. Diffusion models like this make the AI feel like it's keeping pace with you—instant autocomplete, rapid iterations on code or plans, and sub-agents that can handle the boring high-volume work without dragging the whole system down.



That subagent layer is the interesting architectural shift. Complex AI systems aren't one giant smart model anymore. They're orchestras of specialized helpers: one for deep reasoning, several for quick summarization, routing, tool lookup, output checking, etc. Sequential models make those utility calls expensive and slow. Parallel diffusion ones make them cheap and fast enough to use liberally.


Realistic caveats for regular users: These are still best for speed-sensitive, high-volume parts of workflows rather than the absolute hardest frontier reasoning (where the biggest AR models may still have an edge for now). Mercury 2 isn't open weights, so it's API/cloud for now. And like Google's version, the full ecosystem (local runtimes, agent frameworks) is still catching up to make it seamless everywhere.


Use cases that pop immediately: real-time quick programming and "vibe coding" where the model keeps up with your edits, multi-agent coding or support systems where lots of fast sub-calls happen, voice interfaces that don't feel laggy, and any latency-sensitive autocomplete or next-action prediction. At scale, the cost and energy savings from higher throughput on standard hardware add up fast.


The numbers Inception shares (and the independent evals) make the case visually: Mercury 2 sits in the "fast and good" quadrant for diffusion models, pushing what used to require exotic hardware down to commodity GPUs.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink