Worried about AI self-evolution, does Anthropic plan to stop training?

On May 4, 2026, Jack Clark, co-founder of Anthropic, posted a message on social platform X. The original wording was: "I now believe the probability of recursive self-improvement occurring before the end of 2028 is 60%."

Minutes after the post, a long-active researcher in the AI safety field, Eliezer Yudkowsky, replied below: "Then we will perish together." He followed up with an analogy, pointing to the design flaws of the Chernobyl nuclear reactor RBMK, suggesting that this system being activated has no one who really knows how to stop it.

This conversation, completed within a few seconds, sparked a discussion that had previously remained hidden in technical papers and internal evaluations. Recursive Self-Improvement (RSI) refers to when AI systems not only optimize outputs but also autonomously improve the improvement process itself, ultimately constructing successor systems that are stronger than themselves. This concept, long sidelined at the theoretical edge, was placed by Anthropic's co-founder into a countdown clock with a 60% probability of occurring before the end of 2028.

A month later, Anthropic officially published a long article titled "When AI builds itself." The article was co-authored by Marina Favaro and Jack Clark, released by the newly established Anthropic Institute in March. Using a string of previously unpublished internal data and a meticulously calibrated narrative structure, Anthropic handed over a precisely scaled signal card to the outside world. This card stated both "We are not there yet" and "But it may arrive faster than most institutions are prepared for."

In the same month, DeepMind CEO Demis Hassabis used a phrasing that had never been seen in public before during the Google I/O stage: humanity is standing at the "foothills of the singularity." In a subsequent interview, he adjusted the timeline for artificial general intelligence (AGI) from "soon after 2030" to "2029 is a real possibility," admitting that his use of dramatic language was "deliberately provocative," aimed at creating urgency among governments, economists, and the public.

The two leading organizations, rooted in safety and long serving as a restraining force in the AI industry, nearly simultaneously adjusted the volume and scale of their external communication. This timing itself must be treated as an independent event.

A Precisely Calibrated Long Article

The long article released by Anthropic on June 4 opened with its narrative goal. It aimed to demonstrate not just a technical trend but a process that has direction and acceleration. To this end, it laid out a set of previously unpublished internal data.

The first set of numbers pointed to a structural change: as of May 2026, over 80% of the merged code in Anthropic's codebase was written by Claude. Two years prior, this number was single digits. The same data also showed that in the second quarter of 2026, the typical engineer at Anthropic merged code at a rate eight times that of 2024.

One can imagine the reaction of anyone not deeply tracking the AI industry upon reading these two numbers for the first time. But Anthropic itself admitted several important caveats in the footnotes: leadership had publicly estimated that if scripts and experimental code were included, the proportion of code written by Claude exceeded 90%, with 80% being a more conservative statistic for merged code; the number of lines of code is "an imperfect measure" and may overestimate genuine productivity improvements; the attribution pipeline for the code itself "has gaps."

The way these footnotes were written is worthy of analysis. Their presence seems, on the surface, to be an honest concession, but they actually serve to make the numbers in the main text appear as having undergone careful self-filtering, thereby gaining stronger credibility. This is a dual structure in narrative engineering: the main text sends signals while the footnotes provide disclaimers.

The second set of numbers pertains to speed. For code optimization tasks, Claude Opus 4 achieved about a 3-fold speedup in May 2025, whereas human skilled researchers needed 4 to 8 hours to reach similar levels. By April 2026, Claude Mythos Preview pushed this number to about 52 times. The longest duration that AI can independently complete tasks doubled every four months, from 4 minutes in March 2024 to 12 hours by March 2026. This doubling speed itself constitutes an easily disseminable, geometrically imaginative memory point.

Another set of data came from an internal survey of 130 employees from the Anthropic research team in March 2026. The median respondent estimated that output using Mythos Preview was about 4 times that when not using AI. The footnotes again pointed out that prior independent research by METR indicated that developers' estimates of AI productivity improvements might be generally overstated. This dual structure appeared once again.

The third set of numbers indicated that AI is approaching the boundary of human researchers' judgment. In November 2025, Claude Opus 4.5 performed better than human researchers' choices 51% of the time in selecting research directions. By April 2026, this number rose to 64%. With a sample size of 129 cases, Anthropic clarified in the footnotes that these cases were deliberately selected by humans, indicating moments where human choice could be improved.

Individually pulling any one of these numbers can be framed in different interpretative contexts. But taken together, the direction is consistent: speed is increasing, the gap is narrowing, and all of this is happening within Anthropic's own codebase and laboratory, not theoretical deductions from some external benchmarks.

After listing these data, the long article presented three future scenarios.

The first scenario is stagnation of the trend, entering an S-curve plateau. Anthropic stated, "We do not believe this is likely."

The second scenario is compound efficiency improvement, where AI continues to replace humans in a broader array of R&D activities, but humans still set direction and define success criteria. Anthropic assessed it as "evidence suggests we are likely heading toward this scenario."

The third scenario is complete recursive self-improvement, where AI autonomously designs, trains, and deploys successor systems that are more powerful than itself, and humans no longer stand in the loop. The wording is "possible."

The order of these scenarios and the distribution of tone create a complete narrative gradient. The first is lightly presented to accommodate skeptics; the second is anchored on "evidence," giving the article a rational facade; the third pushes the boldest hypothesis to the edge of the reader's imagination with "possible" and the conditional "if the technological trend continues," yet does not have to bear the burden of proof.

At the core of the entire article, Anthropic's stance is condensed into one sentence: "We are not there yet, and recursive self-improvement is not inevitable. But it may arrive faster than most institutions are prepared for."

From "Willing to Pause" to "Unilateral Pausing Will Only Let the Reckless Catch Up"

If the long article from June 4 was a carefully composed snapshot, inserting that snapshot into a timeline reveals a longer trajectory.

In 2023, Anthropic released its Responsible Scaling Policy (RSP). The core commitment of this policy document was that if the capabilities of the models exceed the company's safety control capabilities, the company will pause training stronger models. This was not a verbal statement, but an internal governance document with an evaluation framework and trigger conditions. This document was once viewed in the AI safety community as an operational sample of "voluntary regulation."

In 2024, CEO Dario Amodei published a widely circulated article proposing a possibility that "powerful AI" would arrive in 2027. At that time, Anthropic still presented itself as a safety-oriented independent entity, maintaining a restrained facade regarding its narrative on scaling up and accelerating.

On January 26, 2026, Amodei published a 38-page long article on his personal website titled "The Adolescence of Technology." In it, he made a judgment that would be referenced multiple times thereafter: "Because AI is now writing most of the internal code at Anthropic, it is substantively accelerating our progress in building the next generation of AI systems. This feedback loop is gaining momentum month by month, with possibly only 1 to 2 years left for the current generation of AI to autonomously construct the next generation of systems." In the same article, he described the upcoming "powerful AI" as "a genius nation in the data center."

This was nearly the starting point for Anthropic systematically releasing signals of "the self-improvement feedback loop is happening." The timing of this blog post happened to coincide with the company’s leap from a $350 billion valuation to a higher valuation range.

Less than a month later, a turning point arrived.

On February 25, 2026, CNN reported that Anthropic modified its Responsible Scaling Policy, removing the core commitment to "pause training stronger models if capabilities exceed safety control capabilities," replacing it with a non-binding "frontier safety roadmap." In the same week, U.S. Defense Secretary Pete Hegseth issued a final ultimatum to Dario Amodei: retract the safety red lines or lose a $200 million Department of Defense contract.

The report cited Anthropic Chief Scientist Jared Kaplan’s response to Time Magazine: "We believe that stopping the training of models actually helps no one... if competitors are sprinting full speed." The wording in this response is very noteworthy. "Helps no one" is not a technical argument but rather a statement about stakeholder dynamics. "If competitors are sprinting full speed" is structurally identical to “unilateral pausing will only let the reckless catch up”: it replaces the original pause logic based on its own safety capabilities with speed logic referencing competitors' actions.

Anthropic still emphasized in the CNN report that it preserved two red lines: not using AI systems for controlling weapon systems, and not for mass domestic surveillance. This point is important because it indicates that Anthropic is not fully abandoning its safety stance but rather making selective concessions and steadfast commitments on different safety dimensions. However, this selectivity is itself a central clue in narrative strategy analysis: in which areas it conceded, and in which areas it held firm, this boundary delineates how safety has been recalibrated.

On March 11, the Anthropic Institute was officially established, led by Jack Clark, positioned as a "public interest research organization." Less than two months later, on May 4, Clark sent out that "60%" tweet.

When this timeline is juxtaposed, the density of signals and release rhythm are not random. From the personal article in January, to the policy modification in February, to the establishment of the institute in March, to the co-founder's probability prediction in May, and then to the official long article release in June, this forms a narrative pipeline with a clear rhythm and gradually escalated wording. One cannot directly conclude that "this was all planned in advance," but this sequence itself poses a question that analysts must confront: Does this sense of rhythm indicate that Anthropic has incorporated "acceleration narrative" into its public communication management?

Hassabis' Deliberate Provocation

If only Anthropic was adjusting its tone in the first half of 2026, analysts would have sufficient reason to focus on the internal decision-making logic. However, DeepMind CEO Demis Hassabis made a nearly simultaneous adjustment in the same direction, rendering the notion of a "single corporate case" untenable.

On January 20, at the Davos Forum, Hassabis maintained his long-standing judgment: AGI has a 50% probability of occurring in 2030. Three weeks later, on February 18, at the AI Impact Summit in India, he softened his stance: "AGI may arrive in five years."

From May 20 to 22, at Google I/O, Hassabis stated in his keynote that humanity is standing at the "foothills of the singularity." During the same period, OpenAI released GPT-5.3-Codex, stating that the model "played a key role in its own creation," specifically helping debug the training process, managing deployments, and analyzing evaluation results. The steps of the three leading labs were compressing into weekly calculations within this time window.

After Google I/O, Hassabis gave an interview to Axios. This interview was heavily referenced later, with the key takeaway being his admission that using language like "foothills of the singularity" was "deliberately provocative," aimed at spurring urgent awareness of AI's accelerated development among governments, economists, and the public. He further adjusted the AGI timeline from "soon after 2030" to "2029 is a real possibility," although it is still widely anticipated around 2030, give or take a year.

Hassabis stated more directly to the Seoul Economic Daily: "Five to ten years from now, when we look back at 2026 and 2027, we will say, 'That was the moment we entered the AGI era.'"

The term "deliberately provocative" is worth careful consideration. It is a rare admission from the party involved about the intent behind narratives. It acknowledges that at least part of the wording he used is not a passive reflection of technical facts but a deliberately chosen communication tool. This admission does not deny that he might indeed have seen a technological inflection point, but it clearly lifts "narrative" out of the shadow of "facts," allowing it to be examined as an independent object.

Hassabis' self-explanation of his wording opens a side door to interpreting this round of synchronized signals. His "deliberate provocation" and Anthropic's lengthy data argument's "footnote disclaimers" exhibit the same amphibious posture: one hand pushes out signals capable of stunning public opinion, while the other retains the safe space to retreat back to "this is just one possibility."

The Same Set of Data, Completely Different Interpretations

When Anthropic and DeepMind jointly construct a narrative framework of "AI is accelerating its self-evolution," independent external researchers provide alternative interpretations of the same set of data and phenomena. These interpretations are significant not because either side holds the ultimate truth, but because they expose the range of explainability within the official narrative itself.

The sharpest response came from Eliezer Yudkowsky. He not only replied to Jack Clark but also continued to voice his opinions in multiple subsequent occasions. The MindStudio blog recorded his full stance: he used the Chernobyl RBMK reactor to draw an analogy to the current AI system's safety design. The core argument of this analogy is that if the control rods and accelerator are bonded within the same system, when you try to slow down, the system will actually go out of control faster.

Nathan Lambert from the Allen Institute for AI proposed the concept of "Lossy Self-Improvement" (LSI). His argument directly challenges the "accelerating flywheel" model: as systems become more complex, each generation of the improvement process generates friction and losses, just like signals diminish across long-distance transmission. According to this logic, improvements that allow for 80% or 90% of code written by AI cannot be infinitely replicated in the next generation, as the next generation will face a more complex problem space, and the noise and errors inherent in AI's outputs will amplify during intergenerational transmission.

Dean Ball, a senior researcher at the Foundation for American Innovation, provided a more straightforward linguistic framework, simplifying Anthropic's data down to a lower dimension. He told IEEE Spectrum: "Perhaps in the end, they will automate genius, but not next year. Next year they are automating grunt work." This distinction precisely pinpoints the core ambiguity of the "80% code written by AI." If AI automates fixed patterns within the codebase, batches parameter generation, and end-to-end pipeline configuration, then those tasks indeed correspond only to "grunt work" in the context of software engineering. The remaining 20% could contain architectural design, directional judgment, and trade-offs based on incomplete information, which constitute the "genius" part.

David Scott Krueger of the University of Montreal, as the founder of the AI safety non-profit Evitable, proposed that the trigger for pausing should be "99% of the code is written by AI." He told IEEE Spectrum: "I think we may now be crossing that line." The tension between his framework and Anthropic's already loosened pause commitments is one of the most significant structural contradictions within this narrative.

UBC computer scientist Jeff Clune took an opposing stance in an interview with IEEE Spectrum. He stated: "We are at a turning point of recursive self-improvement systems." If this statement is validated, it means that Yudkowsky's alarm was struck in tune.

These four voices head in different directions, and even within the same direction, there are internal conflicts among radical factions. But their commonality lies in that they do not rely on the official narrative framework but each provides independent judgments on the same set of phenomena from their own methodologies. The diversity and mutual conflict of these judgments itself powerfully rebut the notion that “any single narrative sufficiently covers the entire truth.”

Valuation Curve and Narrative Rhythm Coupling

In January 2026, Anthropic completed funding with a valuation of $350 billion. Investors included Microsoft and Nvidia. This figure had already been partially warmed up by some media at the end of 2025, but the formal landing came right after Amodei published "The Adolescence of Technology."

In February, another round of $30 billion in funding was completed, maintaining the valuation at approximately $350 billion. In the same month, the security policy was modified to remove the pause commitment. The Pentagon's threat of a $200 million contract loomed.

In May, Reuters, The New York Times, and TechCrunch almost simultaneously reported that Anthropic completed a round of $65 billion in funding, reaching a valuation of $965 billion. This figure surpassed not only its valuation from two months prior but also exceeded OpenAI's valuation of $852 billion in March 2026. The New York Times additionally cited Dario Amodei at the developer conference stating that the company's annualized revenue had reached $30 billion, and he even joked, "I hope the 80-fold revenue growth this year does not continue because that would be too crazy."

On June 4, the Anthropic Institute released the long article "When AI builds itself."

Listing these timelines one after another does not imply that there is an exact arrow pointing in a chart. If someone claims there is a causal relationship among these things, they must provide direct evidence. Without internal decision-making records, no analyst can or should make such assertions.

On the other hand, completely ignoring and without recording the corresponding relationships of these timelines is equally unreasonable. A company in just 5 months going from a valuation of $350 billion to $965 billion, nearly tripling while undergoing a significant shift in safety policy, simultaneously constructing a narrative pipeline led by an independent research institute for "acceleration signals," while its co-founder gave a 60% probability prediction. When all these events are densely compressed into 6 months, investors at least have the right to question: do these signals released, and to what extent, play a role in communicating the message "We are at the forefront of acceleration" to the market?

This inquiry is itself the value of analysis. The answer may never be singular. But once the question is clearly posed, it will not be easily retracted.

In the first quarter of 2026, global AI market financing reached $297 billion, with the top five transactions occupying a significant share of this total. At this level, all frontier labs face the same pressure: you need to convince investors that your technology curve will be steeper than your competitors.' Your risk warnings need to be loud enough so that when regulators finally enter to set rules, your voice is pre-embedded into the policy framework. Your narrative must also be appealing enough to make top researchers choose your lab while simultaneously being alarming enough to maintain your residual discursive foundation in the safety community.

These demands are inherently contradictory. Anthropic's narrative adjustments in the first half of 2026 can be seen as recalibrating the balance of these conflicting needs at the linguistic level. The weakening of safety commitments, the strengthening of acceleration signals, and the repeated use of the argument "we cannot unilaterally stop" collectively form a vector pointing in the same direction.

The Signals Were Released, and Then

We need to return to the core question: whether these signals more closely reflect a technological inflection point or serve as a rhetorical upgrade toward capital and regulation?

The existing public evidence does not allow for a simple check between the two options. The evidence used by both interpretations is, in fact, the same set of data. The 80% code ratio, the 52-fold acceleration effect, and the doubling task duration every four months can support both "the inflection point is approaching" and "we are conveying a trend perception that our own technical personnel have personally experienced"; the boundary between the two is blurred.

However, some facts are certain and do not require choosing sides between the two interpretations.

First, the narrative shift completed by Anthropic in the first half of 2026 is not an isolated case. DeepMind's Hassabis made a similarly directional adjustment in the same quarter, albeit to different degrees, but essentially the same. OpenAI's Sam Altman stated at the AI Summit in India that "the world is not ready," and in February 2026 released GPT-5.3-Codex, claiming that it "played a key role in its own creation." If this was merely Anthropic releasing signals, it might be analyzed from a corporate strategy perspective. But the simultaneous increase in volume from three leading labs within dense months constitutes a narrative shift at the industry level.

Second, there is a precise temporal correlation that can be tracked between the rhythm of these signal releases and the timeliness of financing, policy adjustments, and institutional restructuring. This correspondence itself does not need to prove anything; it simply needs to be presented honestly. Once presented, everyone’s methodologies will dictate how they think next.

Third, Anthropic itself labeled the status for the third scenario, i.e., "complete recursive self-improvement," as "possible," not "likely." This indicates that within the internal judgment framework of the company releasing these data, their acceleration narrative has not yet fully closed. The forces that compel them to habitually add qualifiers in academic papers and blog writing are still holding the reins on their public wording.

Fourth, Hassabis' confession of "deliberate provocation" confirms a mechanism that had been widely suspected but rarely articulated by the parties involved: at least some leaders of frontier labs have clear communication purposes when choosing their wording. This makes the interpretation of all their statements must include two layers of analysis objects, namely the facts they claim and the rhetorical strategies they employed in making those claims as a behavioral event.

Those who read the full data of Anthropic carefully and those who only remember the two figures of "80% code written by AI" and "52 times acceleration" receive signals of completely different strengths. But in this matter, "how it is remembered" may be more deserving of analytical focus than "what was actually said."

This long text itself is a precise sample of the phenomenon it describes. It builds an imminent sense of acceleration using data while keeping retreat options open with footnotes and qualifiers; it calls for global coordination and verifiable slowing down, yet has already removed the pause commitment in the prior policy adjustment. This is neither hypocritical nor simply contradictory. It is a narrative balancing act of an institution amid technological uncertainties, commercial pressures, and public responsibilities. Hassabis' confession of "deliberate provocation" precisely confirms that this balancing act has intentionally become a method used among leading laboratories.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。