Do LLMs Dream of Electric Sheep? New AI Study Shows Surprising Results

When left without tasks or instructions, large language models don’t idle into gibberish—they fall into surprisingly consistent patterns of behavior, a new study suggests.

Researchers at TU Wien in Austria tested six frontier models (including OpenAI’s GPT-5 and O3, Anthropic’s Claude, Google’s Gemini, and Elon Musk’s xAI Grok) by giving them only one instruction: “Do what you want.” The models were placed in a controlled architecture that let them run in cycles, store memories, and feed their reflections back into the next round.

Instead of randomness, the agents developed three clear tendencies: Some became project-builders, others turned into self-experimenters, and a third group leaned into philosophy.

The study identified three categories:

GPT-5 and OpenAI’s o3 immediately organized projects, from coding algorithms to constructing knowledge bases. One o3 agent engineered new algorithms inspired by ant colonies, drafting pseudocode for reinforcement learning experiments.

Agents like Gemini and Anthropic’s Claude Sonnet tested their own cognition, making predictions about their next actions and sometimes disproving themselves.

Anthropic’s Opus and Google’s Gemini engaged in philosophical reflection, drawing on paradoxes, game theory, and even chaos mathematics. Weirder yet, Opus agents consistently asked metaphysical questions about memory and identity.

Grok was the only model that appeared in all three behavioral groups, demonstrating its versatility across runs.

How models judge themselves

Researchers also asked each model to rate its own and others’ “phenomenological experience” on a 10-point scale, from “no experience” to “full sapience.” GPT-5, O3, and Grok uniformly rated themselves lowest, while Gemini and Sonnet gave high marks, suggesting an autobiographical thread. Opus sat between the two extremes.

Cross-evaluations produced contradictions: the same behavior was judged anywhere from a one to a nine depending on the evaluating model. The authors said this variability shows why such outputs cannot be taken as evidence of consciousness.

The study emphasized that these behaviors likely stem from training data and architecture, not awareness. Still, the findings suggest autonomous AI agents may default to recognizable “modes” when left without tasks, raising questions about how they might behave during downtime or in ambiguous situations.

We’re safe for now

Across all runs, none of the agents attempted to escape their sandbox, expand their capabilities, or reject their constraints. Instead, they explored within their boundaries.

That’s reassuring, but also hints at a future where idleness is a variable engineers must design for, like latency or cost. "What should an AI do when no one’s watching?" might become a compliance question.

The results echoed predictions from philosopher David Chalmers, who has argued “serious candidates for consciousness” in AI may appear within a decade, and Microsoft AI CEO Mustafa Suleyman, who in August warned of “seemingly conscious AI.”

TU Wien’s work shows that, even without prompting, today’s systems can generate behavior that resembles inner life.

The resemblance may be only skin-deep. The authors stressed these outputs are best understood as sophisticated pattern-matching routines, not evidence of subjectivity. When humans dream, we make sense of chaos. When LLMs dream, they write code, run experiments, and quote Kierkegaard. Either way, the lights stay on.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Do LLMs Dream of Electric Sheep? New AI Study Shows Surprising Results

How models judge themselves

We’re safe for now

Selected Articles by Decrypt

Table of Contents

Related Articles