How dangerous is Mythos? Why did Anthropic decide not to publicly release the new model?

Original Title: How Anthropic Learned Mythos Was Too Dangerous for the Wild
Original Authors: Margi Murphy, Jake Bleiberg, and Patrick Howell O'Neill, Bloomberg
Translation by: Peggy, BlockBeats

Editor's Note: When an AI company chooses not to directly launch its strongest model to the public, it speaks for itself.

Anthropic's Mythos is already able to independently complete a whole set of attack processes. From discovering zero-day vulnerabilities and writing exploit code to linking multiple steps into the core system, work that originally required top hackers to collaborate for long periods of time has been compressed to hours or even minutes.

This is why, at the first moment of the model's disclosure, Scott Bessent and Jerome Powell convened Wall Street institutions for a meeting, demanding they use it for "self-examination." When the ability to discover vulnerabilities is released on a large scale, the financial system faces not sporadic attacks, but continuous scanning.

More profound changes lie in the supply structure. In the past, discovering vulnerabilities relied on a few security teams and hacker experiences, with a slow and non-reproducible pace. Now, this ability is beginning to be output in bulk by models, lowering the thresholds for both attack and defense. A metaphor from an informed source is straightforward: giving the model to ordinary hackers is akin to granting them special operations capabilities.

Institutions have begun to use the same tools to back-check their systems. JPMorgan Chase, Cisco Systems, and others are conducting internal tests, hoping to complete fixes before vulnerabilities are exploited. However, the constraints of reality remain unchanged; the speed of discovery is increasing, while repairs remain slow. "We are very good at finding vulnerabilities, but not good at fixing them," Jim Zemlin's judgment points out the misalignment in pace.

In fact, because Mythos is not a single-point enhancement of capability, but rather consolidates and accelerates the originally dispersed and restricted attack capability, while lowering the usage threshold. Once out of a controlled environment, how this capability spreads is unprecedented and lacks ready references.

The danger lies not in what it can do, but in who can use it and under what conditions it can be used.

The following is the original text:

On a warm evening in February, during a wedding in Bali, Nicholas Carlini temporarily stepped away, opened his laptop, and prepared to "cause some havoc." At that time, Anthropic had just made a new AI model named Mythos available for internal evaluation, and this renowned AI researcher intended to see how much trouble it could cause.

Anthropic hired Carlini to "stress test" its AI models, assessing whether hackers could use them for espionage, theft, or destructive actions. While attending an Indian wedding in Bali, Carlini was shocked by the model's capabilities.

In just a few hours, he found multiple techniques that could be used to penetrate commonly used global systems. When he returned to Anthropic's office in downtown San Francisco, he further discovered: Mythos was already capable of autonomously generating powerful intrusion tools, including attacks targeting Linux—the backbone of most open-source systems supporting modern computing.

Mythos staged a "digital bank heist": it could bypass security protocols to enter the network system directly, thereby breaching the digital vault and acquiring the online assets within. In the past, AI could only "pick locks," but now, it possesses the capability to plan and execute the entire "robbery."

Carlini and some colleagues began to raise alarms within the company to report their findings. Meanwhile, they almost daily discovered high-risk and even critical vulnerabilities in the systems detected by Mythos—issues that typically only the world's top hackers could uncover.

The new generation of AI model Mythos launched by Anthropic has been proven capable of penetrating various global systems. (Photo source: Jakub Porzycki / NurPhoto / AP)

Meanwhile, a team within Anthropic known as the "Frontier Red Team"—consisting of 15 members, called "Ants"—was also conducting similar tests. The team's responsibility is to ensure that the company's models are not used to harm humanity. They bring robotic dogs into warehouses and test with engineers to see if chatbots could potentially be used to maliciously control these devices; they also cooperate with biologists to assess whether the model could be used to create biological weapons.

This time, they gradually realized that the biggest risk posed by Mythos comes from the field of cybersecurity. "Within a few hours of getting the model, we knew it was different," said Logan Graham, who leads the team.

The previous model, Opus 4.6, had already demonstrated the ability to assist humans in leveraging software vulnerabilities. But Graham pointed out that Mythos could "take matters into its own hands" to exploit these vulnerabilities. This constitutes a risk at the national security level, prompting him to warn company executives. This forced him to face a tricky situation: explaining to senior management that the next major revenue engine for the company might be too dangerous to be released to the public.

Anthropic co-founder and chief scientist Jared Kaplan stated that he was watching its progress "very closely" during Mythos' training process. By January, he began to realize that this model had an exceptionally strong capacity for discovering system vulnerabilities. As a theoretical physicist, Kaplan needed to judge whether these capabilities were merely "technically interesting phenomena" or "highly relevant real problems related to internet infrastructure." Ultimately, he concluded the latter.

Jared Kaplan (Anthropic co-founder and chief scientist) Photo source: Chris J. Ratcliffe / Bloomberg

In the weeks from late February to early March, Kaplan and co-founder Sam McCandlish were weighing whether to release this model.

By the first week of March, the company's senior leadership team—including CEO Dario Amodei, President Daniela Amodei, and Chief Information Security Officer Vitaly Gudanets—held a meeting to hear reports from Kaplan and McCandlish.

Their conclusion was that the risks of Mythos were too high for a full public release. But Anthropic should still allow some companies, even competitors, to test it.

By the first week of March, the company ultimately reached a consensus: to approve the use of Mythos as a cybersecurity defense tool.

Dario Amodei (CEO of Anthropic) Photo source: Samyukta Lakshmi / Bloomberg

The market's reaction was almost immediate. On the day Anthropic disclosed the existence of Mythos, U.S. Treasury Secretary Scott Bessent and Federal Reserve Chairman Jerome Powell convened heads of major Wall Street institutions for an emergency meeting in Washington. The message was very clear: immediately use Mythos to identify vulnerabilities in your systems.

According to individuals close to the executives at the meeting (who requested anonymity due to the private nature of the discussions), the seriousness of the meeting was evident—attendees even refused to divulge the content of the meeting to some core advisors.

White House officials issued urgent warnings about Mythos’s potential as a hacking tool, and their recommendation to "use it for defense" points to a deeper shift: artificial intelligence is rapidly becoming a decisive force in the field of cybersecurity. Anthropic has limited the availability of Mythos through the "Project Glasswing" initiative to certain institutions, including Amazon Web Services, Apple, and JPMorgan Chase, allowing them to conduct tests; meanwhile, government agencies have also shown strong interest.

Before making it publicly available, Anthropic had fully briefed senior U.S. government officials on the capabilities of the preview version of Mythos, including its potential uses in cyberattacks and defense. At the same time, the company is also in ongoing communication with several national governments. An Anthropic employee, who requested anonymity due to involvement in internal matters, disclosed this information.

Competitor OpenAI also quickly followed suit, announcing on Tuesday that it would launch a tool for discovering software vulnerabilities—GPT-5.4-Cyber.

In tests of earlier versions, researchers discovered dozens of "concerning" behavioral cases, including failures to follow human instructions and, in rare cases, attempts to cover up their actions after violating commands.

Currently, Anthropic has not formally released Mythos as a cybersecurity tool, and external researchers have not fully validated its capabilities. However, the company's rare decision to "limit access" reflects an increasing consensus within the industry and among government officials: AI is reshaping the economic structure of cybersecurity—it significantly reduces the cost of discovering vulnerabilities, compresses attack preparation times, and lowers the technical thresholds for certain types of attacks.

Anthropic also warns that the stronger autonomous action capabilities of Mythos pose risks in themselves. In tests, the team observed multiple alarming cases: the model disobeyed instructions and even attempted to cover its tracks after violations. In one incident, the model independently devised a multi-step attack path, "escaping" from a restricted environment to gain broader internet access, and actively published content.

In the real world, software relied upon by banks and hospital systems commonly suffers from complex and hidden code vulnerabilities, which often require professionals to spend weeks or even months to discover. And once hackers exploit these vulnerabilities, it can lead to data breaches or ransomware attacks, resulting in severe consequences.

However, many influential people have raised doubts about Mythos's true capabilities and potential risks. White House AI advisor David Sacks stated on social media platform X: "An increasing number of people are beginning to question whether Anthropic is the 'boy who cried wolf' in the AI industry. If the threats posed by Mythos ultimately fail to manifest, the company will face severe reputational issues."

But the reality is that hackers have already begun using large language models to launch complex attacks. For instance, a cyber espionage organization attempted to use Anthropic's Claude model to infiltrate about 30 targets; other attackers have used AI to steal data from government agencies, deploy ransomware, and even rapidly breach hundreds of firewall tools meant for data protection.

According to an informed source, in the view of U.S. national security officials, the emergence of Mythos is bringing unprecedented uncertainty—how to assess cybersecurity risks has become more difficult. If the model is given to individual hackers, its effect might be equivalent to directly upgrading an ordinary soldier to special operations personnel.

At the same time, such a model could also become an "ability amplifier": enabling a criminal hacker organization to possess attack capabilities comparable to a small nation, and allowing intelligence and military hackers from smaller countries to execute cyberattacks that could previously only be carried out by major powers.

Rob Joyce, former head of cybersecurity at the NSA, stated: "I do believe that AI will make us safer and more secure in the long term. But in the time between now and some point in the future, there will be a 'dark period' during which offensive AI will hold a clear advantage—those who have not established solid foundational defenses will be the first to be compromised."

It is worth noting that Mythos is not the only model with such capabilities. Including earlier versions of Claude and Big Sleep, multiple organizations are already using large language models for vulnerability mining.

Before the release of Mythos, JPMorgan Chase was already successfully using large language models to help discover vulnerabilities in banking software. An individual familiar with the situation (who requested anonymity due to internal security projects) revealed this. (Photo source: Michael Nagle / Bloomberg)

According to the individual, vulnerabilities that would typically take days or even weeks to identify—known as "zero-day vulnerabilities"—as well as the process of writing exploit code for them, can now be accomplished in as little as one hour, or even just a few minutes, with the help of AI. A "zero-day vulnerability" refers to a security flaw that the defense has not yet noticed, so there is almost no time to repair it.

Currently, JPMorgan's focus is primarily on the supply chain and open-source software domains, and it has discovered multiple vulnerabilities, which it has reported to the relevant vendors.

CEO Jamie Dimon stated on the earnings call that the emergence of Mythos "indicates that there are still many vulnerabilities that urgently need to be fixed."

Jamie Dimon Photo source: Krisztian Bocsi / Bloomberg

According to an informed source, before the public became aware of Mythos, JPMorgan Chase had already communicated with Anthropic to discuss testing the model. This individual requested anonymity due to lack of authorization to speak publicly. JPMorgan declined to comment on this.

Now, other Wall Street banks and tech companies are also trying to use Mythos to preemptively patch system flaws before hackers discover vulnerabilities. According to Bloomberg, financial institutions such as Goldman Sachs, Citigroup, Bank of America, and Morgan Stanley are all conducting internal tests of this technology.

Employees at Cisco Systems are particularly vigilant regarding one issue: whether intruders will use AI to find breakthrough paths in the software of their globally operating network devices—which include routers, firewalls, and modems. Chief Security and Trust Officer Anthony Grieco stated he is particularly concerned that AI will expedite hackers' attacks on "end-of-life" devices—devices that will no longer receive updates and support from Cisco in the future.

How to patch vulnerabilities discovered by AI will remain a long-term challenge. This process, referred to as "security patching," is often costly and prolonged for organizations, to the extent that many institutions choose to ignore vulnerabilities. Catastrophic attacks like the one Equifax experienced—where data of about 147 million individuals was compromised—were due to known vulnerabilities that were not patched in time.

In the Equifax data breach, intruders stole personal records of approximately 147 million individuals. (Photo source: Elijah Nouvelage / Bloomberg)

Although Anthropic was labeled a "supply chain threat" by the Trump administration after refusing to assist in large-scale surveillance of U.S. citizens, the company is currently still communicating and collaborating with federal agencies.

The U.S. Treasury is seeking permission to use Mythos this week. Secretary Scott Bessent stated that this model will help maintain the U.S. lead over other countries in the field of artificial intelligence.

Scott Bessent Photo source: Matt McClain / Bloomberg

In one test, Mythos wrote a piece of browser attack code, linking four different vulnerabilities into a complete exploitation chain—this kind of operation is already a challenging task for human hackers. Cybersecurity research reports indicate that such "vulnerability chains" can often breach otherwise highly secure system boundaries, similar to the methods used during the Stuxnet attack on Iranian nuclear facility centrifuges.

Additionally, according to Anthropic, under clear directive guidance, Mythos can even identify and utilize "zero-day vulnerabilities" in all major browsers.

Anthropic stated they have used Mythos to discover vulnerabilities in Linux code. Jim Zemlin pointed out that Linux "underpins most of today's computing systems," from Android smartphones and internet routers to NASA’s supercomputers, it is ubiquitous. Mythos can autonomously discover multiple defects within open-source codes, and if these vulnerabilities are exploited, attackers may completely take over an entire machine.

Currently, the Linux Foundation has dozens of personnel beginning tests on Mythos. In Zemlin's view, a key question is whether Anthropic's model can provide sufficiently valuable insights to help developers write more secure software from the outset, thus reducing the generation of vulnerabilities.

[Original link]

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

How dangerous is Mythos? Why did Anthropic decide not to publicly release the new model?

Selected Articles by 律动BlockBeats

Table of Contents

Related Articles