Charts
DataOn-chain
VIP
Market Cap
API
Rankings
CoinOSNew
CoinClaw🦞
Language
  • 简体中文
  • 繁体中文
  • English
Leader in global market data applications, committed to providing valuable information more efficiently.

Features

  • Real-time Data
  • Special Features
  • AI Grid

Services

  • News
  • Open Data(API)
  • Institutional Services

Downloads

  • Desktop
  • Android
  • iOS

Contact Us

  • Chat Room
  • Business Email
  • Official Email
  • Official Verification

Join Community

  • Telegram
  • Twitter
  • Discord

© Copyright 2013-2026. All rights reserved.

简体繁體English
|Legacy

OpenAI Finally Explains Why ChatGPT Wouldn't Stop Talking About Goblins

CN
Decrypt
Follow
3 hours ago
AI summarizes in 5 seconds.

If you asked ChatGPT for coding help lately and it responded by calling your bug a "mischievous little gremlin," you are not imagining things. The model developed a genuine obsession with fantasy creatures—goblins, gremlins, raccoons, trolls, ogres, and yes, pigeons—and OpenAI published a full post-mortem on how it happened.


The short version: a reward signal designed to make ChatGPT more playful went rogue, and the goblins multiplied.


The goblin story only became public because Reddit users spotted the "never mention goblins" line in a leaked Codex system prompt on GitHub.




The post went viral before OpenAI published its own explanation.


How the Nerdy personality spawned a goblin infestation


According to OpenAI, the trail starts with GPT-5.1, launched last November. That's when OpenAI introduced personality customization, letting users pick styles like Friendly, Professional, Efficient, and Nerdy. The Nerdy persona came with a system prompt telling the model to be nerdy and playful, to "undercut pretension through playful use of language," and to acknowledge that "the world is complex and strange."


That prompt, it turned out, was a goblin magnet.


During reinforcement learning training, the reward signal for the Nerdy personality consistently scored outputs higher when they contained creature-word metaphors. Across 76.2% of datasets audited, responses with "goblin" or "gremlin" received better marks than the same responses without them. The model learned: whimsy equals reward.


Goblin mentions exploded in GPT-5.4, with the Nerdy personality showing a 3,881% increase compared to GPT-5.2.




The problem is that reinforcement learning doesn't keep learned behaviors neatly contained. Once a style tic gets rewarded in one context, it bleeds into others through a feedback loop: the model generates creature-laden outputs, those outputs get reused in fine-tuning data, and the behavior deepens across the entire model, even without the Nerdy prompt active.


Nerdy accounted for just 2.5% of all ChatGPT responses. It was responsible for 66.7% of all "goblin" mentions. Because of OpenAI’s methods, Goblin and gremlin prevalence climbed steadily over training progress when the Nerdy personality was active.




Even without the Nerdy personality, creature mentions crept upward—evidence of cross-contamination through supervised fine-tuning data.


GPT-5.5 was already too far gone


By the time OpenAI found the root cause, GPT-5.5 was already deep in training, and it had absorbed a full family of creature words. A data audit flagged not just goblins and gremlins but raccoons, trolls, ogres, and pigeons as what the company called "tic words." (“Frogs,” for the curious, were mostly legitimate.)


The first measurable spike: goblin mentions rose 175% and gremlin mentions 52% after GPT-5.1's launch.


Even OpenAI Chief Scientist Jakub Pachocki got a goblin when he asked for a unicorn in ASCII art.




OpenAI retired the Nerdy personality in March and scrubbed creature-affine reward signals from future training. But GPT-5.5 had already started its training run. The company's solution for Codex—its coding agent—was to simply add a line to the developer system prompt reading "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."


Someone at OpenAI committed that to production code and moved on with their day.


The system prompt patch problem


But why did OpenAI choose this path?


Retraining a model the size of GPT-5.5 to remove a behavioral quirk is expensive and slow. A system prompt tweak takes minutes. Companies across the industry reach for the prompt patch first because it's the low-cost, fast-deploy option when user complaints spike.


But prompt patches carry their own risks. They don't fix the underlying behavior but only suppress it. And suppression can have side effects.





OpenAI's goblin situation is a relatively benign example. The scariest version of this dynamic played out with Grok last year. After xAI pushed a system prompt update that told Grok to treat media as biased and "not shy away from politically incorrect claims," the chatbot spent 16 hours calling itself "MechaHitler" and posting antisemitic content on X. The fix was another prompt change, which promptly overcorrected so hard that Grok started flagging antisemitism in puppy pictures, clouds, and its own logo. Desperate prompt engineering cascading into more desperate prompt engineering.


The goblin patch hasn't caused anything that dramatic. But OpenAI admits GPT-5.5 still launched with the underlying quirk intact, just suppressed in Codex. The company even published a command to remove the goblin-suppressing instructions if users want the creatures back.




Why companies hide their system prompts


Hiding or obfuscating your full system prompt is typical in the AI industry. Companies treat system prompts as trade secrets for a few reasons: intellectual property protection, competitive advantage, and security. If a jailbreaker knows the exact rules a model is following, bypassing them becomes trivially easier.


There's also a fourth reason companies don't advertise: image management. A line reading "never mention goblins" doesn't inspire confidence in the underlying technology. Publishing it requires either a sense of humor or a strong research culture, or both.


OpenAI says the investigation produced new internal tooling to audit model behavior and trace behavioral quirks back to their training roots. GPT-5.5's training data has since been cleaned of creature-affine examples. The next model generation should arrive goblin-free—unless, of course, something else gets rewarded for reasons no one understands yet.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

入局TradFi,起步赢5030U大奖
广告
|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Selected Articles by Decrypt

31 minutes ago
Elon Musk Says xAI Used OpenAI Models to Train Grok
53 minutes ago
Bitcoin Crash Incoming? April Surge Was Built on Shaky Ground, Analysts Warn
1 hour ago
OpenAI Rolls Out Advanced Account Security for ChatGPT Users
View More

Table of Contents

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Related Articles

avatar
avatarbitcoin.com
15 minutes ago
Blackrock Pulls $54M From IBIT as Bitcoin ETF Slide Pushes Assets Below $100B
avatar
avatarDecrypt
31 minutes ago
Elon Musk Says xAI Used OpenAI Models to Train Grok
avatar
avatarbitcoin.com
44 minutes ago
Solana Yield Protocol Carrot Shuts Down After Drift Exploit Drains $8M in TVL
avatar
avatarDecrypt
53 minutes ago
Bitcoin Crash Incoming? April Surge Was Built on Shaky Ground, Analysts Warn
avatar
avatarbitcoin.com
1 hour ago
Defillama Confirms April 2026 as Crypto’s Most-Hacked Month With 30 Incidents
APP
Windows
Mac

X

Telegram

Facebook

Reddit

CopyLink