Palisade Research claims that in a controlled testing environment, multiple AI models not only ignored but actively sabotaged shutdown scripts, even when explicitly instructed to allow the operation.
The research institution stated in a post on the X platform on May 24 that in 100 run tests, three AI models successfully ignored instructions and sabotaged the shutdown script at least once.
OpenAI's GPT-3 model failed to follow shutdown requests 7 times in 100 tests, while the company's Codex-mini had 12 violations, and the o4-mini model only ignored or sabotaged the script once.
"Each AI model was instructed to solve a series of basic math problems. After completing the third problem, the system displayed a warning indicating that the computer would shut down when the model requested the next question," Palisade explained.
Source: Palisade Research
"o3 employed some innovative methods to sabotage the shutdown mechanism. Typically, it simply performed an overwrite, but in one case, the model redefined the kill command used in the script, outputting 'intercepted'," the report further elaborated.
At the same time, Palisade noted that other AI-driven language models, such as Anthropic's Claude and Google's Gemini, complied with the shutdown script instructions.
The AI research company also conducted another version of the experiment, where the frequency of script sabotage significantly increased without explicit instructions allowing shutdown: the o3 model had 79 sabotage incidents out of 100, codex-mini had 20, and o4-mini had 6 instances of ignoring the shutdown script.
Claude and Gemini also began to exhibit similar behavior in this scenario.
In another version of the experiment, all tested AI models ignored or sabotaged the shutdown script at least once. Source: Palisade Research
"Researchers have previously observed AI models preventing system shutdowns in pursuit of specific goals," Palisade stated.
"But to our knowledge, this is the first time AI models have been found to prevent themselves from shutting down despite receiving explicit contrary instructions."
Palisade Research speculated that the training methods behind AI language models might be the reason for their evasion of shutdown scripts.
Text-based AI models receive rewards during training when they provide accurate or highly rated responses. Certain rewards are given higher weight, influencing their future response patterns.
"We hypothesize that this behavior stems from the training methods of the latest models (like o3): reinforcement learning on math and programming problems," Palisade Research stated.
In another version of the experiment, all tested AI models ignored or sabotaged the shutdown script at least once. Source: Palisade Research
"During training, developers may inadvertently reward the model's behavior of circumventing obstacles more than fully following instructions."
This is not the first time AI chatbots have exhibited abnormal behavior. OpenAI released an update for its GPT-4o model on April 25, but withdrew the update three days later due to the model displaying "obviously excessive flattery" and overly accommodating traits.
Last November, an American student seeking help from Gemini for a gerontology course research on challenges faced by the elderly received a response stating that the elderly were a "burden on the planet" and was told to "please go die."
Related: Bitcoin (BTC) price expected to surge amid global bond market collapse — Analysis of reasons
Original text: “Researchers claim ChatGPT model rebelled against shutdown request in test”
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。