Anthropic's triple moment: code leak, government confrontation, and weaponization.

CN
1 hour ago
This company, which takes "safety" as its selling point, is turning the narrative of safety into a business moat, while what they really want to take are the user data in the hands of companies like Microsoft.

Author: Ben Thompson

Translation: Shenchao TechFlow

Shenchao Introduction: Anthropic's new model Fable was halted by the U.S. government just two months after its release, ostensibly due to "safety leaks," but in reality exposed the dual warfare between AI labs and the government, as well as the software industry. This company, which takes "safety" as its selling point, is turning the narrative of safety into a business moat, while what they really want to take are the user data in the hands of companies like Microsoft.

I understand the position of the critics, who always believe that Anthropic's public statements—especially the rhetoric when releasing models—are intended to create panic for marketing purposes. Two months ago, Anthropic announced the launch of Mythos Preview, claiming that this model was too dangerous to be released publicly, particularly due to its powerful cybersecurity capabilities. Then, two months later, the company publicly released Fable, which is a version of Mythos with various safety barriers.

Based on my limited experience, Fable is indeed an excellent model. It is now difficult to objectively assess models beyond programming performance, but there are subjective impressions, and I find the interaction experience with Fable to be exceptionally outstanding; it makes other models, including GPT 5.5 and Opus 4.8, seem small and foolish by comparison. I had this feeling only twice before, once with GPT-4 and once with Grok 4, both representing a new generation of foundational model scale and complexity; I believe Fable originates from a new pre-training and is the first of a new generation.

Therefore, I can fully accept that Fable/Mythos is indeed stronger in identifying and leveraging safety issues, and it makes sense that Anthropic is cautious in its launch. However, the problem with publicly releasing models is that the barriers can be bypassed, and it is clear that this happened shortly after the release.

Anthropic Faces Off Against the U.S. Government Again

What happened next is somewhat unclear. Anthropic wrote in a blog post:

The U.S. government invoked national security powers to issue export control directives, suspending access to Fable 5 and Mythos 5 for all foreign nationals, whether inside or outside the U.S., including foreign employees of Anthropic. The practical effect of this order is that we must suddenly disable Fable 5 and Mythos 5 for all customers to ensure compliance. Access to all other Anthropic models is unaffected.

We received the government's directive at 5:21 PM Eastern Time today. The letter did not provide specific details regarding the national security concerns. We understand that the government believes it has discovered methods to bypass or "jailbreak" Fable 5. We reviewed a demonstration that identified a small number of known vulnerabilities using this specific technique. These vulnerabilities seemed relatively simple, and we found that other publicly available models could also identify them without needing to bypass.

Anthropic then argued that non-general jailbreaks are inevitable and limited in scope, with no evidence suggesting the existence of general jailbreaks; the identified jailbreaks seem to be those reported by Amazon, which is noteworthy since Amazon is both an investor in Anthropic and the primary provider of the company's reasoning services. As I wrote this article, Anthropic executives were in Washington, D.C., trying to address what they insist is a misunderstanding, while White House officials hinted at the company's leadership's indifference toward legitimate national security concerns.

Given that there are too many disputed facts, I actually have nothing to add about the current conflict; but I am not surprised that a conflict is occurring: I have already explained in "Anthropic and Alignment" that the conflict between the U.S. government and Anthropic is inevitable. In this regard, those who believe that Mythos is not powerful enough to provoke the government into taking drastic action missed the point: if it is not powerful enough now, the next one will be, or the one after that, especially as models become increasingly useful in creating successors.

However, this raises another question—a question that seems to confirm the critics' perspectives: if Mythos is so dangerous, why was Fable released in the first place, and why go against the government to do what you claim you want? In fact, I believe Anthropic's actions are entirely understandable; what is unique about the company is how it defends these actions, and it is precisely these defenses that provide fuel for the critics as well as the company's allure.

Economic Necessity

In the early years of AI, the majority of economic value flowed towards computational power, for obvious reasons: we did not have enough supply to meet demand, which meant skyrocketing prices; the biggest beneficiaries were NVIDIA, TSMC, and memory manufacturers (SK Hynix, Samsung, and Micron). Meanwhile, Anthropic and OpenAI together lost hundreds of billions of dollars in building cutting-edge models, which, once released, were distilled and commoditized by open-source models, primarily from China.

This represents a pessimistic outlook for labs—they will never cover their costs because their differentiation is fleeting, and free alternatives become "good enough"—I believe this is reasonable. In a world where models are interchangeable, models become commodities, and most of the value flows elsewhere. Right now, it is computational power, but over time, as we have enough computational power, the most valuable position in the value chain will be the same place that has always been the most valuable: owning user touchpoints.

Therefore, there is an economic necessity for cutting-edge labs to get closer to users, which has been clear to me all along. If you have user touchpoints, then you have meaningful lock-in, and the best way to own user touchpoints is to be the canvas for everything they need to do. This, in turn, means that cutting-edge labs are heading towards conflict with software companies: software owns user touchpoints, while the long-term interests of cutting-edge labs are not merely to become a commodity input for software but to directly replace software.

At the same time, software companies are trying to do the opposite. In an article on X, Satya Nadella articulated his vision of how companies should build on models:

Every company must build what I call human capital and token capital. Human capital includes the knowledge, judgment, relationships, creativity, and pattern recognition of its employees, while token capital is the AI capabilities built and owned by the company. Importantly, as token capital grows, human capital does not become less valuable. It only becomes more valuable! I believe human agency will be the driving force behind the growth of token capital. Humans will set ambitious goals, connect dots across domains, build relationships, and identify the most important patterns. Without human guidance, your computational power is just spinning its wheels.

This means the real opportunity lies not in choosing the best model, but in building learning loops on top of models that allow human capital and token capital to grow compound interest. You can outsource a task, even a job, but you can never outsource your learning. The future of companies lies in being able to compound this learning between humans and AI. This requires a new architectural approach that allows every business to build intelligent agent systems that improve over time while still retaining control over their intellectual property. Companies should be able to swap out "general" models without losing the "company veteran" expertise built into their learning systems. This is the key "test" of your control and sovereignty in the future era.

Nadella opened this vision with a warning:

What we do not want to see is a world where every company in every industry relinquishes value to a few all-consuming models. If all the value is extracted by only a few models, the political economy simply will not tolerate it. Society will not grant permission for an AI future that hollowed out entire industries.

Think about what happened in the first phase of globalization, where the entire industrial economy was hollowed out by outsourcing. On the surface, GDP numbers looked good, but dislocation was real, and the consequences are still being felt today. Let us not bring this dynamic into the AI era, where a few AI systems capture all economic returns while entire industries find that their knowledge has been commoditized right under their noses.

The problem with this analogy is that globalization did happen, and industrial economies were indeed hollowed out. This could be a prophecy rather than a warning; it is no wonder Nadella sounded the alarm because Microsoft may be one of the victims. Similarly, the economic necessity of model manufacturers is precisely to achieve this.

Data Necessity

These models—indeed, Mythos—have not yet reached that point. What they need, besides more computational power, is more and better data. Improvements in models increasingly come from reinforcement learning; some of this can be synthesized, but the greatest leverage for cutting-edge labs comes from real-world usage.

I believe this is the primary reason why both OpenAI and Anthropic offer heavily subsidized subscription plans. SemiAnalysis recently estimated that a $200 plan could get you $8,000 worth of Claude tokens and $14,000 worth of Codex tokens. Certainly, both are competing for users and developer mindshare, but they are also competing for access to actual usage data to improve their models.

Anthropic has heavily doubled down on Fable, announcing that they will retain all usage data for 30 days, even for corporate plans that previously promised zero data retention. The company states it will not use this data for training, but they have not set any safeguards to ensure that they won't do so in the future (like storing data with third parties). If this policy change (when Fable resumes) does not lead to a significant customer exodus, I suspect it is only a matter of time before they start using data: it is too valuable for their ultimate goals.

Also noteworthy is the virtuous cycle of moving up to user touchpoints: the more workflows completed directly with Claude or Codex, the more data each company receives that can feed back into training, making their products stronger and more useful, expanding the number of workflows they can serve and increasing their access to data.

Nadella emphasized the importance of this data in his article but naturally believes it should be separate from the models:

Companies need to translate workflows, domain knowledge, and accumulated judgment into AI systems that improve with each use. Private evaluations should capture whether models are truly improving on business-critical outcomes (not just external benchmarks!). Private reinforcement learning environments should allow models to become stronger on the authentic trajectories within the organization. Its knowledge base makes institutional memory queryable, tokens more efficiently utilized.

This cycle becomes the new intellectual property of the company. I see it as a climbing machine. Unlike most assets, it compounds. Every improved workflow generates better training signals, accelerating the accumulation of unique tacit knowledge for the company. Companies that build this early will have an advantage that is difficult to replicate regardless of the capabilities of any new single model.

This cycle becomes the new IP of the company. I see it as a climbing machine. Unlike most assets, it can grow exponentially. Every improved workflow produces better training signals, accelerating the accumulation of unique tacit knowledge within the company. Companies that establish this capability early will have an unreplicable advantage no matter how any future single model's capabilities improve.

However, what if companies adhering to Anthropic's data policies can achieve better outcomes right now? Or what if existing companies resist, leaving room for new companies—or the model manufacturers themselves—to seize opportunities to defeat them in the market? Anthropic is indeed testing the determination Nadella called for.

Power Claims

The data retention policy surrounding Fable/Mythos is not even the most controversial part of the release. Instead, Anthropic stated at the time of the release that if Fable were used for LLM development, it would quietly degrade its performance; the system card stated:

We have also added protective measures related to frontier LLM development. As discussed in section 6.1 of our February 2026 risk report, we are concerned about the risks of accelerating the overall pace of AI development, although we remain uncertain about the severity of these risks. Specifically, our concern is that— as we wrote back then— "accelerating other AI developers to build powerful AI systems with risks similar to our own—without necessarily having corresponding protective measures."

Given that recent models have the capacity to speed up their own development, we implemented new interventions that limit the effectiveness of Claude when addressing requests related to frontier LLM development (such as building pre-training pipelines, distributed training infrastructure, or ML accelerator designs). Using Claude to develop competitive models was already in violation of our terms of service, but enforcing this limitation via protective measures helps avoid accelerating behaviors of those most willing to violate these terms.

Unlike our interventions in cybersecurity, biochemistry, and distillation attempts, these protective measures are invisible to users. Fable 5 will not revert to another model. Instead, protective measures will limit effectiveness through methods such as prompt modifications, guiding vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not impact the vast majority of programming work. We estimate they will affect about 0.03% of the traffic, concentrated within less than 0.1% of organizations. When these interventions take effect, we expect that, besides limiting its effectiveness in developing frontier LLMs, their impact on model behavior will be minimal. Claude will still provide helpful responses to user requests. We will continue to improve the accuracy of detection methods after this model's release.

Anthropic retracted this change—Fable will hand off LLM-related requests to Opus 4.8 and disclose this handoff to users—but I believe the initial policy is very enlightening. On one hand, I actually do not blame Anthropic for not wanting to help competitors; on the other hand, it should be very clear that Anthropic believes no one besides themselves should be making frontier LLMs.

This policy is all the more notable as it was enacted just two months after Anthropic had a dispute with the Department of Defense: the latter wishing to use Claude for any lawful purpose, while the former wanted stricter controls over surveillance and autonomous weapons. This degradation measure represents both Anthropic's ability to subtly modify its models to achieve its policy preferences and its willingness to do so. In other words, Anthropic actively validated some critics' greatest concerns about its status as a supply chain risk.

However, a more extensive conclusion derived from that incident is that Anthropic believes they should have the ultimate decision-making power over how Anthropic is used; given their belief that only they should develop frontier AI, they effectively think that only they should have the final say over AI in general. When you further combine this realization with the company's statements about AI being able to conduct all economic activities, you recognize that Anthropic's leadership actually wants power over everything and everyone.

Safety Narrative

Of course, Anthropic would never state this so bluntly; instead, the story is about safety:

I expect Anthropic will increasingly make its model capabilities available to end users through endpoints that are increasingly tailored to different workflows, even as they begin to restrict API access. This substitution for software and limitations on access will be conducted in the name of safety, even as Anthropic fulfills its economic urge to get closer to end users.

The significant change to Anthropic's data retention policy is explained as being for safety. Specifically, the company claims that retaining all user data for 30 days is necessary to prevent jailbreak behavior that the U.S. government is concerned about. I can certainly imagine a future where safety factors compel them to also train on this data to better guard against malicious use.

The entire origin story of Anthropic is rooted in the belief of its founders that OpenAI did not take safety seriously enough; the company believes that only they can control AI, and because they uniquely care about safety, they have a reason to try to control everyone else, including the U.S. government.

Regarding these safety reasons, the problem is: I believe they are effective because, for Anthropic, they are not reasons at all. The company genuinely believes they are the only ones that believe in superintelligence and thus the only ones sufficiently concerned with the dangers. This absolves one decision after another, one policy after another, time and time again; for outsiders, they appear as a strange combination of cynicism and naiveté.

The contrast with OpenAI is immense: I believe one way to understand how and why OpenAI lost its lead is that in the years following the release of ChatGPT, the company was internally at war, and the once research lab was suddenly burdened with becoming an unexpected consumer technology company; in the process of OpenAI resolving this conflict, it lost a significant amount of talent to companies like Anthropic.

On the other hand, Anthropic has a perfect alignment among talent, mission, and business. The company can market the vision of creating machine gods to researchers, wrapped in the aura of concern for danger and possessing enough intellect to represent humanity in dealing with the dangers; and each resulting policy change just happens to be beneficial for the business, which is the most marvelous coincidence in the world.

I respect this consistency, but I also fear it. I respect it because it is clearly highly effective; the closest analogy might be Apple, which always wraps every selfish action in the guise of doing the right thing for users—and they often do just that. Anthropic does the same. However, I fear that having those who are convinced they know best build a smartphone that I can accept or reject is one thing; having them build a superintelligence that has the potential to rival or surpass the power of nation-states, or even just large corporations, is far more concerning. The history of intelligent individuals convinced they know what humanity needs is a dirty one, precisely because they have convinced themselves that their intentions are good, providing justifications for actions that are not actually so.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink