According to researchers at Apple, leading AI models still face significant difficulties in reasoning, indicating that the development of Artificial General Intelligence (AGI) remains a long road ahead.
In a paper titled "The Illusion of Thinking" published in June, Apple researchers pointed out that despite the integration of large reasoning models (LRMs) in recent updates of leading AI large language models (LLMs) such as OpenAI's ChatGPT and Anthropic's Claude, their foundational capabilities, scaling characteristics, and limitations "are still not fully understood."
They emphasized that the current evaluation systems mainly focus on established mathematical and programming benchmarks, "overemphasizing the accuracy of final answers."
However, the researchers stated that this evaluation method does not truly reveal the reasoning capabilities of AI models.
The research findings sharply contrast with the industry's expectation that AGI could be achieved in just a few years.
The research team designed various different puzzle games to test the "thinking" and "non-thinking" variants of Claude Sonnet, OpenAI's o3-mini and o1, as well as DeepSeek-R1 and V3 chatbots, assessing a range beyond standard mathematical benchmarks.
They found that "frontier LRMs face a complete accuracy collapse beyond specific complexities," failing to effectively generalize reasoning capabilities, and as the complexity of problems increases, their advantages gradually diminish, which is vastly different from expectations for AGI capabilities. "We found that LRMs have significant limitations in precise calculations: they cannot apply explicit algorithms, and the reasoning processes across different puzzles lack consistency."
The researchers observed that model reasoning is inconsistent and superficial, while also discovering a phenomenon of overthinking, where AI chatbots can generate correct answers early on but then fall into erroneous reasoning paths.
The research team concluded that LRMs merely mimic reasoning patterns without truly internalizing or generalizing these patterns, which is far from the requirements for AGI-level reasoning capabilities. "These findings challenge the industry's general assumptions about LRM capabilities and suggest that current methods may be facing fundamental barriers to generalizing reasoning."
AGI is seen as the ultimate goal of AI development, referring to machines that can think and reason like humans, achieving a state comparable to human intelligence.
In January of this year, OpenAI CEO Sam Altman stated that the company is closer than ever to building AGI. He remarked at the time, "We are now confident that we know how to build AGI in the traditional sense."
Last November, Anthropic CEO Dario Amodei predicted that AGI would surpass human capabilities within the next one to two years. He said, "If you judge solely by the rate of growth of these capabilities, it indeed makes one think we will achieve this goal by 2026 or 2027."
Related: Anti-corruption watchdog clears Javier Milei of suspicion in LIBRA cryptocurrency scandal
Original article: “Apple Researchers Believe AI Models Are Still Far from Achieving AGI-Level Reasoning Capabilities”
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。