Altman Declares a Major Step Towards AGI, Microsoft Integrates Early
Written by: Li Dan, Wall Street Insights
This year, OpenAI's most anticipated product has arrived.
On Thursday, August 7, Eastern Time, OpenAI announced the launch of its next-generation flagship artificial intelligence (AI) model, GPT-5. It is OpenAI's first "integrated" AI system, combining the reasoning capabilities of the o series models with the rapid response capabilities of the GPT series models.
OpenAI CEO Sam Altman praised GPT-5 at the model launch event, calling it "the best model in the world," a "significant upgrade" compared to previous models, and stated that its release marks an "important step" for OpenAI on the path to achieving general artificial intelligence (AGI).
OpenAI introduced that GPT-5 performed excellently in multiple benchmark tests, reaching cutting-edge levels in programming, mathematics, health, and other fields. GPT-5 achieved a 74.9% accuracy rate in the SWE-bench Verified code test, slightly surpassing Anthropic's newly released model Claude Opus 4.1 on Tuesday. Additionally, GPT-5's hallucination issue has significantly improved, with an error rate of only 4.8%, far lower than the 20.6% of the previous model GPT-4o.
Starting from this Thursday, GPT-5 is available to all free users of ChatGPT and paid subscribers of Plus, Pro, and Team as the default model, and will be launched within a week for Enterprise and Edu paid plans.
Like GPT-4o, the difference between the free and paid versions of GPT-5 lies in usage limits. Plus users enjoy higher usage limits, while Pro users can use it unlimitedly and receive the enhanced version GPT-5 Pro. For free users, the full reasoning capabilities may take a few days to be fully available. Once free users reach the usage limit of GPT-5, OpenAI will switch them to the smaller model GPT-5 mini.
OpenAI also announced on Wednesday that it will provide ChatGPT products to U.S. federal government agencies for a symbolic fee of $1 per year. Specifically, this refers to the enterprise version of ChatGPT, which includes enhanced security and privacy features.
Just as OpenAI officially announced GPT-5, Microsoft announced that starting this Thursday, it will integrate GPT-5 into its extensive product portfolio, including platforms like 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry, allowing Microsoft's enterprise and consumer users to immediately experience the advanced reasoning capabilities and programming advantages of GPT-5.
GPT-5 Has Three Major Advantages in Programming, Creative Writing, and Health
OpenAI's announcement of GPT-5 begins by stating that GPT-5 is OpenAI's "smartest, fastest, and most practical model, with built-in thinking capabilities that allow everyone to possess expert-level wisdom."
According to OpenAI, as its "most powerful model," GPT-5 has achieved significant improvements in three key areas.
First is programming capability. GPT-5 is OpenAI's most powerful coding model to date, excelling in complex front-end generation and debugging large codebases, capable of creating aesthetically pleasing responsive websites, applications, and games with just one prompt. Early testers noted improvements in design choices such as spacing, typography, and white space.
In the benchmark test SWE-bench Verified, which retrieves real-world coding tasks from GitHub, GPT-5 achieved a 74.9% accuracy rate on its first attempt after reasoning, surpassing OpenAI's reasoning model o3 at 69.1% and GPT-4o at 30.8%.
Commentators pointed out that this means GPT-5 slightly outperformed Anthropic's Claude Opus 4.1 and Google's DeepMind's Gemini 2.5 Pro, which scored 74.5% and 59.6% respectively in the SWE-bench Verified test.
However, in the Humanity's Last Exam test, which measures expert-level capabilities across disciplines in mathematics, humanities, and natural sciences, the enhanced version of GPT-5 with extended reasoning capabilities, GPT-5 Pro, scored 42% when using tools. This is slightly lower than the 44.4% score of xAI's model Grok 4 Heavy.
Altman stated that GPT-5 is particularly adept at on-demand launching of entire software applications, known as "ambient coding," where AI generates functional code based on natural language prompts, thereby accelerating development speed.
As an example, OpenAI researchers demonstrated a request for GPT-5 to create a web app to help English-speaking users learn French, which must have an engaging theme, include flashcards, quizzes, a classic Snake game, and a method to track daily learning progress.
The researchers submitted the same prompt to two GPT-5 windows, and minutes later, two different apps were generated. OpenAI's head stated that these apps "have some flaws," but users can adjust the AI-generated software according to their preferences, such as changing backgrounds or adding more tabs.
In creative writing, GPT-5 can handle structurally complex writing tasks, such as unrhymed iambic pentameter or naturally flowing free verse. Nick Turley, OpenAI's VP of ChatGPT business, stated that GPT-5 shows "better taste" in creative tasks, with responses appearing more natural.
Health consulting is the third significant area of improvement.
GPT-5 can more proactively flag potential health issues and help users interpret medical results, although OpenAI emphasizes that ChatGPT cannot replace medical professionals.
In the HealthBench Hard Hallucinations test, the reasoning-capable GPT-5 had a hallucination error rate of only 1.6%. This is far lower than the error rates of GPT-4o and o3 models, which were 15.8% and 12.9%, respectively.
Significantly Reduced Hallucination Possibility with New Safety Training Mode
OpenAI stated that GPT-5 is more reliable and practical than previous models, capable of answering real-world questions more accurately, with a significantly reduced likelihood of hallucinations.
After enabling web search on anonymous prompts representing ChatGPT's production traffic, the likelihood of GPT-5's responses containing factual errors is about 45% lower than that of GPT-4o; after reasoning, the likelihood of GPT-5's responses containing factual errors is about 80% lower than that of o3. As shown in the chart below, GPT-5's error rate is only 4.8%, while GPT-4o's is 20.6% and o3's is 22%.
OpenAI also stated that it has introduced a new form of safety training for GPT-5, called safe completions. This teaches the model to provide the most helpful answers possible within a safe range. Sometimes, this may mean partially answering user questions or only providing high-level responses.
If a refusal is necessary, the trained GPT-5 will transparently inform users of the reason for the refusal and provide safe alternatives.
In controlled experiments and OpenAI's production models, OpenAI found that this safe completions approach is more nuanced, better guiding dual-use issues, enhancing robustness against ambiguous intentions, and reducing unnecessary excessive refusals.
Michelle Pokrass, OpenAI's post-training lead, stated: "GPT-5 has been trained to recognize when tasks cannot be completed, avoiding guesswork, and can explain limitations more clearly, reducing unfounded assertions compared to previous models."
Launch of Four Optional ChatGPT Chat Personality Presets
OpenAI stated that GPT-5 has improved in instruction execution, and its ability to execute custom instructions has also been enhanced. OpenAI will launch a new research preview version with four preset personality options for all ChatGPT users.
The initial four personality options—Cynic, Robot, Listener, and Nerd—are optional, and users can adjust them at any time in the settings to match the communication style between ChatGPT and the user.
These four personalities will initially apply to text chat and will later expand to voice chat, allowing users to set ChatGPT's interaction style without writing custom prompts—whether concise and professional, thoughtfully supportive, or slightly sarcastic.
OpenAI stated that all these new personalities meet or exceed its internal evaluation standards for reducing sycophantic behavior.
Altman Praises Historic Breakthrough, Says Returning to GPT-4 is Quite Bad
At Thursday's briefing, Altman gave high praise to GPT-5, positioning it as a significant milestone on the path to AGI. He stated:
"At no point in history has it been imaginable to have something like GPT-5." "This is the first time it feels like talking to an expert in any field."
Altman even went so far as to "step on" GPT-4 to elevate GPT-5. He said:
"I tried going back to GPT-4, but it was quite bad."
GPT-5 adopts a unified system architecture equipped with a real-time router that can automatically decide whether to provide a quick response or engage in deep "thinking" based on the type of conversation, complexity, and tool requirements. This eliminates the need for users to select appropriate settings, making ChatGPT easier to use.
In internal benchmark tests for economic value work, GPT-5 using reasoning mode was comparable to or superior to expert levels in about half of the cases, covering over 40 professions including law, logistics, sales, and engineering. OpenAI VP Nick Turley stated, "This model feels really good."
Altman likened using GPT-5 to having a team of experts, all with doctoral degrees, available at any time. He also said, "In many new fields, people are limited by ideas, but in reality, they lack the execution capability."
Microsoft Fully Integrates to Seize the Opportunity
On the day of GPT-5's release, Microsoft announced its integration into a wide range of product lines. In enterprise applications, Microsoft 365 Copilot will leverage GPT-5 to better handle complex issues, maintain focus in long conversations, and understand user context. Enterprise users can process emails, documents, and files through reasoning capabilities.
For consumers, the new intelligent mode of Microsoft Copilot will utilize GPT-5 to help users discover the best solutions. Users can experience GPT-5 for free through copilot.microsoft.com or the Copilot app on Windows, Mac, Android, and iOS devices.
Developers will receive GPT-5 support through GitHub Copilot and Visual Studio Code for writing, testing, and deploying code. The Azure AI Foundry platform will provide all GPT-5 models, equipped with an AI-driven model router that selects the optimal model based on the complexity, performance requirements, and cost efficiency of each task.
Microsoft's AI red team tested the GPT-5 reasoning model using strict security protocols, and the results showed that the model exhibited one of the strongest AI security configurations among all of OpenAI's previous models against various attack modes, including malware generation and fraud automation.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。