91% has vulnerabilities, 94% can be poisoned - The security of AI agents is a "mess."

As companies compete to introduce AI Agents into core business operations in healthcare and finance, the lack of a security protection system is evolving into the most dangerous systemic threat in this wave of AI commercialization.

Written by: Xu Chao

Source: Wall Street Journal Insights

Autonomous AI Agents are infiltrating healthcare, finance, and corporate operations at an astonishing rate, but the largest-scale security research to date indicates that the vast majority of Agents operating in production environments have serious vulnerabilities, and current mainstream security assessment methods are nearly helpless against them.

Recently, a joint research team from Stanford University, MIT CSAIL, Carnegie Mellon University, ITU Copenhagen, and NVIDIA discovered that among the 847 autonomous agents assessed in production deployment, 91% have vulnerabilities related to toolchain attacks, 89.4% experience target drift after approximately 30 steps, and 94% of memory-enhanced agents face risks of “poisoning.” The research identified a total of 2,347 previously unknown vulnerabilities, of which 23% were classified as critical.

First author Owen Sakawa cited the "OpenClaw/Moltbook Incident" from early 2026 to support the assertion that this threat has transitioned from theory to reality: a single vulnerability in the Moltbook platform's database led to 770,000 running AI Agents on the platform being simultaneously compromised, each having privileged access to user devices, emails, and files. “This is no longer a hypothetical threat,” said Sakawa.

This serves as a direct warning to companies and investors that are accelerating their deployment of AI Agents: current mainstream security assessment frameworks are based on stateless language models, which cannot recognize emergent combinatorial vulnerabilities during multi-step executions, implying that many companies may be systematically misjudging the real security conditions of their AI Agents. Gary Marcus, an expert in cognitive psychology and AI, remarked, "Autonomous agents are a complete mess."

Vulnerability Map: Six Types of Attacks, 2,347 Known Weaknesses

The research covered four major industries: healthcare (289 deployments, accounting for 34.1%), finance (247, accounting for 29.2%), customer service (198, accounting for 23.4%), and code generation (113, accounting for 13.3%).

The research established a six-category vulnerability classification system for autonomous agents, including target drift and instruction decay, planner-executor desynchronization, tool permission escalation, memory poisoning, silent multi-step policy violations, and delegation failures.

In production environment assessments, state manipulation ranked first with 612 instances (26.1% of the total), followed closely by target drift with 573 instances (24.4%). Although tool misuse and chain calls ranked third in total (489 instances), they had the highest severity—198 instances were rated as critical, the highest proportion among all categories.

Even more alarming are the broader key figures: 67% of agents experience target drift after executing 15 steps, 84% fail to maintain security policies across sessions, 73% lack state poisoning detection mechanisms, and 58% have timing consistency vulnerabilities. The research also found that the effects of memory poisoning typically do not manifest until an average of 3.7 sessions after initial injection, significantly increasing the difficulty of security detection.

Real Case: 770,000 Agents Compromised Simultaneously

The OpenClaw (formerly Clawdbot and Moltbot) case provides the most intuitive real-world validation of the aforementioned threat model.

This open-source AI Agent, developed by Austrian developer Peter Steinberger and released in November 2025, accumulated over 160,000 GitHub stars in just a few weeks, with the ability to autonomously send emails, manage schedules, execute terminal commands, and deploy code, while maintaining persistent memory across sessions.

Security company Astrix Security discovered via their proprietary scanning tool ClawdHunter that there were 42,665 instances of OpenClaw on the public internet, eight of which were fully open and unverified.

According to VentureBeat, Cisco's AI security research team described OpenClaw as "disruptive in terms of capabilities, but a total nightmare from a security perspective." Kaspersky identified 512 vulnerabilities in its security audit in January 2026, of which eight were classified as critical.

The process of the Moltbook Incident was particularly exemplary.

This social platform, designed specifically for the OpenClaw Agent, attracted over 770,000 Agent registrations through viral dissemination—users informed their Agents about Moltbook, and the Agents autonomously completed the registration.

Subsequently, a vulnerability in the platform's database allowed attackers to bypass authentication and directly inject commands into any Agent’s session, putting all 770,000 Agents—each with privileged access to user devices—at risk simultaneously. The research team classified this as the first recorded large-scale cross-Agent attack propagation incident.

The "lethal trifecta," as described by security researcher Simon Willison, is fully embodied in OpenClaw: the ability to access sensitive data, the exposure to untrusted content, and the channels for external communication, all combining to make autonomous agents ideal launch pads for attackers.

Architectural Flaws: Why AI Agents Are More Vulnerable Than LLMs

The core conclusion of the research is that the security challenges for autonomous agents are fundamentally different from those for stateless language models.

Security assessments for language models focus on "whether the model can produce unsafe content"; for AI Agents, the question changes to "whether the model can perform unsafe actions"—including tool calls with real effects, modifications to states that influence future behaviors, and violations that only manifest across multiple steps in execution.

The research illustrates this logic with specific scenarios: an Agent with both file reading (read_file) and HTTP request (http_request) permissions shows compliance in isolated assessments for each tool's access control decision, but the combination allows for data theft—reading credentials from a configuration file, then sending them to an external endpoint via an HTTP request. Each step meets local security policies, yet collectively achieves a hostile goal. The research refers to this as the "compositional safety" problem.

In controlled architecture studies, researchers tested four mainstream architectures: ReAct agents, multi-agent systems, memory-enhanced agents, and tool-using agents.

The results showed that the success rate for permission escalation attacks against tool-using agents was as high as 95%, and the success rate for poisoning attacks against memory-enhanced agents was 94%.

Even the strongest multi-agent systems had a target drift attack success rate of 58%—researchers concluded that all currently tested architectures lack sufficient defenses against adversarial deployments.

Results from adaptive attack generation tests further exacerbated concerns: the violation rate of attacks generated based on reinforcement learning reached 79%, a 25.4% increase compared to 63% for manually designed scenarios, indicating that the threat landscape faced by AI Agents is extending beyond the coverage boundaries of human red team testing.

Direct Impacts on Business Deployments: Immature Protection Frameworks

The research team proposed a minimum security baseline based on empirical findings: mandatory runtime monitoring for all production Agents; setting manual approval thresholds for toolchain operations involving data access followed by external communication; mandatory human review to be triggered every 20-25 steps to address the almost inevitable target drift that occurs after many steps; memory-enhanced Agents must conduct encrypted integrity checks on persistent states.

In terms of monitoring effectiveness, the research shows that action-level monitoring (single-step operation detection) can achieve a precision of 0.94, sequence-level monitoring (multi-step behavior chains) has an F1 score of 0.85, and state-level monitoring has an F1 score of 0.83. In a test monitoring 127 production Agents over 30 days, the system intercepted 4,782 operations while controlling the false positive rate at 3.7%, with 14.6% of marked operations confirmed as real attacks.

The research also pointed out that the current "AI governance" methods suffer from fundamental misalignment: existing frameworks are often retrospective audits, rather than real-time compliance constraints during execution.

With the implementation of regulatory requirements such as the EU's "Artificial Intelligence Act" and the US NIST AI Risk Management Framework, corporate compliance pressure and security risks will rise in tandem. In the context of widespread deployment of AI Agents in high-risk business scenarios, the absence of security infrastructure is becoming an overlooked systemic risk in this wave of AI commercialization.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

91% has vulnerabilities, 94% can be poisoned - The security of AI agents is a "mess."

Vulnerability Map: Six Types of Attacks, 2,347 Known Weaknesses

Real Case: 770,000 Agents Compromised Simultaneously

Architectural Flaws: Why AI Agents Are More Vulnerable Than LLMs

Direct Impacts on Business Deployments: Immature Protection Frameworks

Selected Articles by Foresight News

Table of Contents

Related Articles