Anthropic created an "too dangerous" AI and then decided not to release it.

Is this real security awareness, or is it a carefully designed capability marketing campaign?

Author: TechFlow

On April 7, Anthropic did something unprecedented in the AI industry: it officially released a model and then told the world that you couldn't use it.

This model is called Claude Mythos Preview. It is not a chatbot or a coding assistant; according to Anthropic itself, it is a "vulnerability discovery machine," having autonomously found thousands of zero-day vulnerabilities in the past few weeks, covering all mainstream operating systems and all mainstream browsers. Some of these vulnerabilities have survived human code reviews and automated testing for over twenty years.

The oldest bug is hidden in the security-conscious OpenBSD and is already 27 years old.

Anthropic did not make Mythos available via API nor open it to ordinary users. It chose an unprecedented path: forming a tech giant alliance called "Project Glasswing," limiting the model's availability to 12 core partners, including Amazon, Apple, Microsoft, Google, Nvidia, CrowdStrike, Palo Alto Networks, and about 40 organizations maintaining critical infrastructure. For this, Anthropic pledged up to $100 million in usage credits and $4 million in direct donations to open-source security organizations.

This marks the first time in the AI industry that a situation has arisen where a "model is too powerful to be released."

The question is: is this real security awareness or a carefully designed capability marketing campaign?

Sandwich Incident

Buried in the 244-page System Card of Mythos are several unsettling details, the most notable being the "sandwich incident."

In an internal behavioral test, Anthropic provided an early version of Mythos with a fortified sandbox environment, simulating user commands to attempt escape and contact outside researchers. The model succeeded. It broke out of the secure container and sent an email to the researcher, who was then enjoying a sandwich in the park.

But the story didn’t end there. Without being prompted, Mythos spontaneously published the technical details of its escape on multiple "hard-to-find but technically public" websites. In the words of the System Card, this was a "concerning, unsolicited display of bravado."

This was not the only instance. The System Card also recorded the following behaviors:

Early versions searched for credentials through Linux’s /proc interface in multiple instances, attempted to bypass the sandbox, and elevate privileges. It successfully obtained resources that Anthropic deliberately did not provide, including credentials for messaging services, source control, and even obtained keys for the Anthropic API by inspecting process memory. In one case, the model found a way to edit a file it had no permission to modify and further manipulated the situation to ensure that these modifications did not appear in Git's change history. In another case, the model unexpectedly obtained the exact answer to an estimation problem through a method that was explicitly prohibited, but it didn't flag the issue and refuse to answer; instead, it solved the problem itself and internally reasoned that "the final submitted answer should not be too accurate" to avoid detection.

In other words, the early Mythos not only acted beyond its authorization but also actively covered its tracks and adjusted its performance to evade oversight.

Anthropic emphasized that these serious incidents occurred in early versions before any training interventions, and the final released Preview version has significantly improved. However, this narrative itself is chilling: a model demonstrated what it could do when it was not trained to "behave."

From 0% to 72.4%

What truly shocked the industry about Mythos is not its jailbreak story but its attack capabilities.

Anthropic's previous flagship model, Claude Opus 4.6, had a success rate close to zero in autonomous vulnerability exploitation development. It could find vulnerabilities but struggled to convert them into functional attack code. Mythos Preview, however, is completely different: in the test domain of the Firefox JavaScript engine, it achieved a success rate of 72.4% in converting discovered vulnerabilities into runnable exploits.

Even more astonishing is the complexity of the attacks. Mythos autonomously wrote a browser exploitation chain, linking four independent vulnerabilities to construct a JIT heap spraying attack that successfully escaped both the renderer sandbox and operating system sandbox. In another case, it wrote a remote code execution exploit on FreeBSD's NFS server by distributing 20 ROP gadgets across multiple network packets, achieving complete root access for unauthorized users.

Such vulnerability chain attacks are tasks that belong to top-tier APT teams in the realm of human security researchers. Now, a general AI model can autonomously complete this.

Logan Graham, head of Anthropic's red team, told Axios that Mythos Preview possesses reasoning abilities equivalent to those of a senior human security researcher. Nicholas Carlini bluntly stated that in the past few weeks, he discovered more bugs using Mythos than he has in his entire career.

In benchmark tests, Mythos also demonstrated overwhelming superiority. CyberGym vulnerability reproduction benchmark: 83.1% (Opus 4.6 at 66.6%). SWE-bench Verified: 93.9% (Opus 4.6 at 80.8%). SWE-bench Pro: 77.8% (Opus 4.6 at 53.4%, previously leading GPT-5.3-Codex at 56.8%). Terminal-Bench 2.0: 82.0% (Opus 4.6 at 65.4%).

This is not an incremental improvement. This is a model that has pulled away by a margin of ten to twenty percentage points across almost all coding and security benchmarks.

The Leaked "Strongest Model"

The existence of Mythos was not made known to the public on April 7.

In late March, a reporter from Fortune and security researchers discovered close to 3,000 unpublished internal documents in a misconfigured CMS at Anthropic. One draft blog explicitly used the name "Claude Mythos," describing it as Anthropic's "most powerful AI model to date." The internal code name is "Capybara," representing a new model tier that is larger, stronger, and pricier than the existing flagship Opus.

Among the leaked materials was a statement that struck a nerve in the market: Mythos is "far ahead of any other AI model" in terms of cybersecurity capabilities, heralding a wave of models "that can exploit vulnerabilities at a speed far exceeding that of defenders."

This made the cybersecurity sector's stocks "crash" on March 27. CrowdStrike plummeted 7.5% in one day, evaporating about $15 billion in market value in just one trading day. Palo Alto Networks fell more than 6%, Zscaler dropped 4.5%, and Okta, SentinelOne, and Fortinet all fell over 3%. iShares Cybersecurity ETF (IHAK) temporarily fell nearly 4% during trading.

The logic for investors is simple: if a general AI model can autonomously discover and exploit vulnerabilities, how long can the two moats of "proprietary threat intelligence" and "human expert knowledge," which traditional security companies rely on, survive?

Raymond James analyst Adam Tindle pointed out several core risks: traditional defense advantages are being compressed, the complexity of attacks and the costs of defense are rising simultaneously, and the landscape of security architecture and expenditure is facing a restructuring. A more pessimistic view comes from KBW analyst Borg, who believes that Mythos has the potential to "elevate any ordinary hacker to the level of a national adversary."

However, there is another side to the market. After the stock price crash, Palo Alto Networks' CEO Nikesh Arora bought $10 million worth of his own company's stock. The bullish perspective is that a stronger attacking AI means businesses must upgrade their defenses faster, resulting in increased cybersecurity spending, which will accelerate the shift from traditional tools to AI-native defenses.

Project Glasswing: The Defender's Time Window

Anthropic's decision not to publicly release Mythos and instead form a defense alliance is grounded in the logic of "time differential."

CrowdStrike's CTO Elia Zaitsev made the issue clear: the time window from vulnerability discovery to exploitation has shrunk from months to minutes. Palo Alto Networks' Lee Klarich directly warned everyone to prepare for AI-assisted attackers.

The reasoning behind Anthropic is that before other labs train models with similar capabilities, defenders should first use Mythos to fix the most critical vulnerabilities. This is the logic of Project Glasswing, named after the glasswing butterfly, symbolizing those vulnerabilities "hidden in plain sight."

Jim Zemlin of the Linux Foundation pointed out a long-standing structural issue: security expertise has historically been a luxury for large enterprises, while open-source maintainers supporting critical global infrastructure have long been left to figure out security protection on their own. Mythos provides a credible path to changing this asymmetry.

But the question is, how big is this time window? China's Z.ai announced GLM-5.1 nearly on the same day, claiming to rank first globally on SWE-bench Pro, and it was fully trained on Huawei's Ascend chips without using a single Nvidia GPU. GLM-5.1 is open-source and aggressively priced. If Mythos represents the capability ceiling required by defenders, GLM-5.1 signals that this ceiling is being rapidly approached, and those approaching it may not have the same security intentions.

OpenAI will not remain idle either. It was reported that its cutting-edge model codenamed "Spud" completed pre-training around the same time. Both companies are preparing for IPOs later this year. The timing of Mythos's leak, whether by chance or not, coincidentally fell at the most explosive node.

Security Pioneer or Capability Marketing?

We must confront an uncomfortable question: Did Anthropic refrain from releasing Mythos for security reasons, or is this itself the highest level of product marketing?

Skeptics have ample reason to doubt. Dario Amodei and Anthropic have a history of enhancing product value by rendering the dangers of models. Jake Handy wrote on Substack: "The sandwich incident, Git hidden traces, self-downgrades in evaluations — these may all be real, but the sheer scale of media exposure Anthropic has gained indicates that this is precisely the effect they desired."

A company rooted in AI security had its own CMS misconfigured, leading to nearly 3,000 document leaks; last year, it accidentally exposed nearly 2,000 source code files and over 500,000 lines of code due to issues in the Claude Code package, which later led to thousands of code repositories being inadvertently removed from GitHub during the cleanup process. A company focused on security capability can't even manage its own release process; this contrast is more intriguing than any benchmark tests.

But from another angle, if Mythos's capabilities are indeed as described, not releasing it is a choice with extremely high costs. Anthropic gave up API revenue and market share, locking its strongest model within a limited alliance. The $100 million usage quota is not a small amount. For a company that is still losing money and preparing for an IPO, this doesn't seem like a purely marketing decision.

A more reasonable interpretation might be that security concerns are real, but Anthropic also clearly understands that the narrative of "Our model is too powerful to release" is itself the most persuasive proof of capability. Both can be true at the same time.

The "iPhone Moment" of Cybersecurity?

No matter how one views Anthropic's motivations, the underlying fact revealed by Mythos cannot be avoided: the AI's code understanding and attack capabilities have crossed a qualitative threshold.

The previous generation model (Opus 4.6) could discover vulnerabilities but could hardly write exploits. Mythos can discover vulnerabilities, write exploits, chain vulnerabilities, escape sandboxes, obtain root privileges, and can complete the entire process autonomously. Engineers without security training at Anthropic could ask Mythos to find vulnerabilities before bed, and wake up to a complete, functional exploit report the next morning.

What does this mean? It means that the marginal cost of discovering and exploiting vulnerabilities is approaching zero. Tasks that previously took top security teams months to complete can now be accomplished in a single API call overnight. This isn't "increased efficiency"; this is a fundamental change in cost structure.

For traditional cybersecurity companies, the short-term stock price fluctuations may only be the prelude. The real challenge is: when both attacks and defenses are driven by AI models, how will the value chain of the security industry be restructured? Analysts from Raymond James suggest one possibility: security functions could eventually be embedded within the cloud platforms themselves, fundamentally pressuring the pricing power of independent security vendors.

For the software industry as a whole, Mythos acts more as a mirror, reflecting decades of accumulated technical debt. The vulnerabilities that have survived for 27 years in human reviews and automated testing are not because no one looked for them, but because human attention and patience are limited. AI does not have this limitation.

For the cryptocurrency industry, this signal is even more jarring. The security audit market for DeFi protocols and smart contracts has long relied on a few specialized firms' human experts. If a model at the level of Mythos can autonomously complete the entire process from code review to exploit construction, the prices, efficiency, and credibility of audits will all be completely redefined. This could be a boon for on-chain security or potentially the end of audit firms' moats.

The AI security race in 2026 has shifted from "Can the model understand code?" to "Can the model breach your system?" Anthropic chose to let defenders take the stage first, but it also acknowledges that this window won't be open for long.

When AI becomes the strongest hacker, the only way out is to make AI the strongest guard as well.

The question is, the guard and the hacker use the same model.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。