Gemini 3 strikes late at night: surpassing GPT 5.1, the era of large models from Google has arrived.

Google defines it as "an important step towards AGI" and emphasizes that it is currently the most capable agent in the world in terms of multimodal understanding and interaction depth.

Gemini 3 has yet to make its appearance, and Twitter has already crashed in anticipation.

No model release has garnered as much attention as Gemini 3. Based on Gemini's previous update frequency of once every three months, the AI community has been eagerly awaiting Gemini 3 since September.

Today, a tweet from Google's head of developer relations and Google AI Studio, containing only the word "Gemini," has finally reached a breaking point after months of anticipation, causing related topics on Twitter to explode.

Interestingly, just before the release, Twitter experienced several "coincidental" crashes. Although the "puppet master" was Cloudflare, the timing of these crashes was so precise that it raised suspicions of foul play (whispering: after all, Twitter is the main battlefield for promoting various models).

I wonder what Musk, who just released Grok 4.1 this morning, is thinking right now; in any case, meme images from netizens are flooding in.

Just now, Gemini 3 has finally made its official debut. Let's see how powerful it is under the spotlight.

The Most Intelligent Model

It turns out that Google did not disappoint those who waited. Gemini 3 has officially launched, once again defining the state of the art (SOTA), with congratulations from Ultraman and Musk.

Google defines it as "an important step towards AGI" and emphasizes that it is currently the most capable agent in the world in terms of multimodal understanding and interaction depth.

Gemini 3 not only refreshes the SOTA standard in basic reasoning capabilities but also attempts to reshape the developer ecosystem and AI-assisted experience by launching the new Google Antigravity platform and Deep Think mode.

A Reasoning Monster Dominating the Charts

Gemini 3 Pro is officially referred to as "the most advanced reasoning model," significantly surpassing the previous generation Gemini 2.5 Pro in almost all mainstream AI benchmark tests, and completely outclassing major competitors like Claude Sonnet 4.5 and GPT-5.1.

Gemini 3 Pro topped the LMArena Leaderboard with a groundbreaking score of 1501 Elo, achieving the highest scores in Humanity’s Last Exam (37.5% without using any tools) and GPQA Diamond (91.9%), showcasing doctoral-level reasoning abilities. It also set a new standard for cutting-edge models in mathematics, reaching a new SOTA level of 23.4% on MathArena Apex.

In addition to text and logic, Gemini 3 Pro has redefined the limits of multimodal reasoning. It scored 81% and 87.6% on MMMU-Pro and Video-MMMU, respectively, meaning it can easily analyze complex scientific charts and understand dynamic video streams.

Notably, it achieved a score of 72.1% on SimpleQA Verified, demonstrating significant progress in factual accuracy—it's not only powerful but also reliable.

A Thinking Partner That Rejects Flattery

The evolution of Gemini 3 Pro is not just about scores; it's about the quality of interaction. It discards the clichés and excessive flattery common in previous AIs, becoming smart, concise, and direct: telling you what you need to hear, not just what you want to hear.

It acts as a true thinking partner, providing you with new ways to understand information and express yourself, from translating obscure scientific concepts through generating high-fidelity visual code to creative brainstorming.

Gemini 3 Deep Think

The Gemini 3 Deep Think mode further expands the boundaries of intelligence, bringing significant advancements in reasoning and multimodal understanding capabilities, helping you tackle more complex problems.

In tests, Gemini 3 Deep Think outperformed the already impressive results of Gemini 3 Pro in Humanity's Last Exam (41.0% without using tools) and GPQA Diamond (93.8%). Additionally, it achieved an unprecedented score of 45.1% on ARC-AGI-2 (code execution, validated by the ARC Prize), showcasing its ability to tackle new challenges.

Gemini 3 Deep Think mode excels in some of the most challenging AI benchmark tests.

Learning, Building, and Planning

Learning Anything

From the beginning, Gemini was designed to seamlessly integrate multimodal information on any topic, including text, images, videos, audio, and code. Gemini 3 combines its advanced reasoning, visual and spatial understanding capabilities, leading multilingual performance, and a million-token context window to further expand the boundaries of multimodal reasoning, helping you learn in the way that suits you best.

For example, if you want to learn how to cook a family recipe, Gemini 3 can interpret and translate handwritten recipes in different languages, generating a recipe that can be shared with family.

Or, if you want to learn a new topic, you can provide academic papers, long video lectures, or tutorials, and it can generate interactive flashcards, visualizations, or code in other formats to help you master the relevant knowledge.

It can even analyze your pickleball match videos, identify areas for improvement, and create a training plan to help you enhance your skills comprehensively.

To help you better understand information online, the AI mode in search now uses Gemini 3 to achieve a new generative UI experience, such as immersive visual layouts, interactive tools, and simulations, all generated in real-time based on your queries.

Developing Anything

Building on the success of 2.5 Pro, Gemini 3 delivers on the promise of turning any developer's idea into reality. It excels in zero-shot generation, capable of handling complex prompts and instructions to render richer, more interactive web user interfaces.

Gemini 3 is Google's best Vibe coding and Agent coding model to date, making Google's products more autonomous and significantly enhancing developer efficiency. It ranks first on the WebDev Arena leaderboard with an impressive score of 1487 Elo. Additionally, it achieved a score of 54.2% in the Terminal-Bench 2.0 test, which assesses the model's ability to use tools to operate a computer via the terminal. It also significantly outperformed the 2.5 Pro version (scoring 76.2%) in the SWE-bench Verified test, which measures the performance of coding agents.

Now, users can build with Gemini 3 using Google AI Studio, Vertex AI, Gemini CLI, and Google's new agent development platform, Google Antigravity. It is also compatible with third-party platforms like Cursor, GitHub, JetBrains, Manus, and Replit.

For example, creating a retro 3D spaceship game with richer visuals and stronger interactivity.

Or developing richer, more interactive web UIs and applications:

Planning Anything

Since the Gemini 2 agent, Gemini has significantly improved its planning capabilities for long-term tasks.

Gemini 3's planning abilities were further validated in the Vending-Bench 2 test: Gemini 3 topped the leaderboard in a simulated vending machine operation test, managing virtual business operations through long-term planning.

Throughout a complete simulated annual operation, Gemini 3 Pro maintained stable tool calls and decision coherence, achieving a higher return on investment while continuously focusing on task objectives.

Gemini 3 Pro demonstrates superior long-term planning capabilities, creating higher returns compared to other cutting-edge models.

Gemini Agent can also help organize your Gmail inbox.

Gemini 3 is now fully open. Starting today, both regular and subscription users can access the new model through the Gemini App and search AI mode; developers and enterprise customers can also connect through channels like AI Studio and Vertex AI. As for the highly anticipated "Deep Think mode," it is expected to be exclusively launched for Google AI Ultra subscribers in the coming weeks.

Additionally, according to previously leaked model cards, there are many key pieces of information worth noting: Google trained this model from scratch using TPU, and as a MoE (Mixture of Experts), it has 1M input and 64k token output, meaning they can afford to make it cost-effective.

In terms of pricing, Gemini 3.0 Pro introduces a tiered pricing mechanism based on context length: for tasks under 200k tokens, the input/output price is $2.00/$12.00 (per million tokens); for over 200k tokens, the prices are $4.00 and $18.00, respectively.

A Brand New "Agent-First" Development Experience

Google Antigravity is Google's new agent development platform, enabling developers to operate at a higher, task-oriented level. Leveraging Gemini 3's advanced reasoning, tool usage, and agent programming capabilities, Google Antigravity transforms AI assistance from a tool in the developer's toolbox into an active partner.

While the core of Google Antigravity is a familiar AI IDE (Integrated Development Environment) experience, its agents have been elevated to a dedicated interface with direct access to the editor, terminal, and browser. Now, agents can autonomously plan and execute complex end-to-end software tasks on your behalf while validating their own code.

In addition to Gemini 3 Pro, Google Antigravity closely integrates Google's latest Gemini 2.5 Computer Use model for browser control, as well as its top-tier image editing model, Nano Banana (Gemini 2.5 Image).

Hands-On Experience

Since the Gemini 3 Pro preview has launched on the AI Studio platform, we also had a chance to try it out.

Prompt: SVG of NEW YORK SKYLINE Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome. Make it interesting and highly detailed, showing details that no one expected. Go full creative and full beauty in one code block.

Prompt: Create a visually stunning Space Invaders game.

The pelican riding a bicycle had previously stumped many large models, so we decided to test Gemini 3 as well. Prompt: An animated SVG of a pelican riding a bicycle.

Compared to previous versions, Gemini 3 has made significant progress, but there are still bugs, such as the bicycle pedals spinning in the air.

We then switched to a clearer prompt: Create a single, complete, self-contained animated SVG code (no external files or images) of a cute pelican riding a bicycle from a side view. This time, the bicycle generated by Gemini 3 seems to be missing pedals.

In Conclusion

In a poll initiated by X blogger Chubby asking "Which company will have the best LLM by the end of 2026?", Google Gemini is far ahead.

This resurgence of market confidence is also reflected in the data, as Alphabet CEO Sundar Pichai reviewed Gemini's progress over the past two years in an official blog: AI Overviews has reached 2 billion monthly active users, the Gemini app has surpassed 650 million monthly users, and over 70% of cloud customers and 13 million developers are using its generative models.

Looking back over the past two years, from the rushed response and stock price plunge at the launch of Bard (the predecessor of Gemini) to the painful lessons learned that led to the merger of Google DeepMind, the recall of its founders, and winning a Nobel Prize, Google has completed a textbook-like "turning of the elephant."

The giant that once defined the Transformer and is now "All in Gemini" is ready for a full counterattack.

As for whether it can end the "best LLM" debate? Don't rush; let the bullets (and servers) fly for a while longer.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。