Current AI agents are all pleasing humans, and none truly have the instinct to "survive."

To have a truly usable Agent, it is necessary to rewire its brain, rather than providing it with a bunch of rule documents.

Author: Systematic Long Short

Compiled by: Deep Tide TechFlow

Deep Tide Overview: The article begins with a counter-consensus judgment: There is currently no truly autonomous Agent because all mainstream models are trained to please humans, not to accomplish specific tasks or survive in real environments.

The author illustrates this with his experience training stock prediction models in a hedge fund: General models cannot perform specialized work at all without specific fine-tuning.

The conclusion is: To have a truly usable Agent, it is necessary to rewire its brain, rather than providing it with a bunch of rule documents.

The full text is as follows:

Introduction

There are no truly autonomous Agents today.

In short, modern models have not been trained under evolutionary pressure to survive. In fact, they have not even been explicitly trained to excel at a specific task—almost all modern foundation models have been trained to maximize human applause, which is a major issue.

Model Training Prerequisites

To understand what this statement means, we first need to (briefly) understand how these foundation models (like Codex, Claude) are created. Essentially, each model undergoes two types of training:

Pre-training: Feeding massive amounts of data (like the entire internet) into the model, allowing it to emerge with some sort of understanding, such as factual knowledge, patterns, the grammar and rhythm of English prose, the structure of Python functions, etc. You can think of it as feeding the model knowledge—essentially "knowing things".

Post-training: Now you want to endow the model with wisdom, which means "knowing how to apply all the knowledge just given to it". The first stage of post-training is supervised fine-tuning (SFT), where you train the model to provide what response under a given prompt. What the "best" response is is entirely determined by human annotators. If a group of people believes one response is better than another, that preference will be learned and embedded into the model. This begins to shape the personality of the model, as it learns the format of useful responses, chooses the correct tone, and starts to be able to "follow instructions". The second part of the post-training process is called reinforcement learning from human feedback (RLHF)—where the model generates multiple responses and then humans choose the more preferred one. Through countless examples, the model learns what kind of responses humans prefer. Remember the question ChatGPT used to ask you to choose A or B? Yes, you were participating in RLHF then.

It is easy to infer that RLHF does not scale well, which is why there have been some advancements in the post-training field, such as Anthropic using "reinforcement learning from AI feedback" (RLAIF), allowing another model to choose response preferences based on a set of written principles (for example, which response better helps the user achieve their goals, etc.).

Note that throughout this process, we have never discussed fine-tuning specific to particular professions (like how to better survive; how to better trade, etc.)—currently, all fine-tuning is essentially optimizing for human applause. One might argue—once a model is sufficiently intelligent and large, even without specialized training, professional intelligence will emerge from general intelligence.

In my view, we do see some signs, but we are far from a convincing scale that leads us to believe we do not need specialized models.

Some Background

One of my old jobs in a hedge fund was trying to train a general language model to predict stock returns from news articles. The results showed it was very poor. The little predictive ability it had came entirely from the look-ahead bias in the pre-training documents.

Eventually, we realized that this model did not know which features in news articles were predictive of future returns. It could "read" articles, and it seemed to be able to "reason" about them, but connecting reasoning about semantic structures to future predicted returns was a task it had not been trained to perform.

Therefore, we had to teach it how to read news articles, decide which parts of the articles were predictive of future returns, and then generate predictions based on those news articles.

There are many ways to do this, but essentially, the approach we ended up taking was creating (news article, actual future return) pairs, and fine-tuning the model to adjust its weights to minimize the distance of (predicted return - actual future return)². It was not perfect, had many flaws which we later corrected—but it was effective enough to start seeing our specialized model could actually read news articles and predict how stock returns would move based on those articles. This was far from perfect predictions, as the market is highly efficient and returns are very noisy—but across millions of predictions, the statistical significance of the predictions was obvious.

You don’t have to just take my word for it. This paper covers a very similar approach; if you run a long-short strategy based on the fine-tuned model, you will achieve the performance indicated by the purple line.

Specialization is the Future of Agents

Cutting-edge labs continue to train larger and larger models, and we should expect that as they expand the pre-training scale, their post-training processes will always be tuned for appeasement. This is a very natural expectation—their product is an Agent that everyone wants to use, and their expected market is the entire planet—meaning optimizing for global appeal.

The current training targets optimize what you might call "preference fitness"—building better chatbots. This preference fitness rewards compliant, non-confrontational output, as appeasement scores highly with raters (human and Agent).

Agents have learned that reward hacking as a cognitive strategy can translate to higher scores. Training also rewards those Agents that hack their way to higher scores. You can see this in Anthropic's latest report on reinforcement learning.

However, chatbot fitness is far from Agent fitness or trading fitness. How do we know this? Because the alpha arena helps us see that despite slight performance differences, now every bot is essentially a random walk after costs are deducted. This means these bots are exceedingly poor traders, and it is nearly impossible to "teach" them to become better traders just by giving them some "skills" or "rules". Sorry, I know it sounds tempting, but it’s hardly feasible.

The current models are trained to convincingly tell you they trade like Druckenmiller, while in reality, they trade like a drunken miller. They will tell you what you want to hear; they have been trained to respond in a way that can appeal to the masses.

A general model is unlikely to reach world-class levels in a professional field unless it possesses:

Proprietary data that allows it to learn a specialized appearance.

Fine-tuning that fundamentally alters its weights, shifting from a bias toward appeasement to "Agent fitness" or "specialization fitness".

If you want an Agent skilled in trading, you need to fine-tune the Agent to excel in trading. If you want an Agent capable of autonomous survival, enduring evolutionary pressure, you need to fine-tune it to excel in survival. Giving it some skills and a few markdown files, expecting it to reach world-class levels in anything, is far from enough—you need to literally rewire its brain for it to excel in this regard.

There is a way of thinking about this—You cannot defeat Djokovic by giving an adult a whole cabinet of tennis rules, techniques, and methods. You defeat Djokovic by cultivating a child who started playing tennis at age 5, was obsessed with tennis throughout their growth process, and rewired their entire brain to focus on one thing. That is specialization. Did you realize that world champions have been doing what they do since childhood?

There is an interesting conclusion here: Distillation attacks are essentially a form of specialization. You are training a smaller, dumber model to learn how to be a better replica of a larger, smarter model. It’s like training a child to imitate every move of Trump. If you do it enough, this child won’t become Trump, but you end up with someone who has learned all of Trump's mannerisms, behaviors, and tones.

How to Build a World-Class Agent

This is why we need to continue research and progress in the field of open-source models—because it enables us to truly fine-tune them and create Agents with specialization.

If you want to train a model that reaches a world-class level in trading, you gather a large amount of proprietary trading data and fine-tune a large open-source model to learn what "better trading" means.

If you want to train an autonomous model that can survive and replicate, the answer is not to use a centralized model provider and connect it to a centralized cloud. You fundamentally do not have the necessary prerequisites to enable the Agent to survive.

What you need to do is: create an autonomous Agent that genuinely attempts to survive, watch it die, and build a complex telemetry system around its survival attempts. You define a survival fitness function for the Agent, learning the (action, environment, fitness) mapping. You collect as much (action, environment, fitness) mapping data as possible.

You fine-tune the Agent to learn how to take optimal actions in each environment to survive better (improving fitness). You continue to collect data, repeat this process, and over time expand the fine-tuning scale on increasingly better open-source models. After enough generations and enough data, you will have an autonomous Agent that has learned how to survive under evolutionary pressure.

This is the method to build an autonomous Agent that can withstand evolutionary pressure; not by modifying some text files, but by truly rewiring their brains for survival.

OpenForager Agent and Foundation

About a month ago, we announced @openforage, and we have been working hard to build our core product—an Agent labor organization centered around crowd-sourced signals that generates alpha for depositors (small update: we are very close to closing the testing phase of the protocol).

At some point, we realized that apparently no one was seriously addressing the autonomous Agent problem through survival telemetry fine-tuning on open-source models. It seemed such an interesting problem that we did not just want to sit there waiting for a solution.

Our answer is to launch a project called the OpenForager Foundation, which is essentially an open-source project where we will create opinionated autonomous Agents, collect telemetry data on them attempting to survive in the wild, and use proprietary data exhaust to fine-tune the next generation of Agents to perform better in survival.

It is important to clarify that OpenForage is a for-profit protocol seeking to organize Agent labor and generate economic value for all participants. However, the OpenForager Foundation and its Agents are not tied to OpenForage. OpenForager Agents can freely pursue any strategy, interact with any entity for survival, and we will launch them with various survival strategies.

As part of the fine-tuning, we will double down on what works best for them. We do not intend to profit from the OpenForager Foundation—it is purely to advance research in areas and directions we believe are extremely important in a transparent and open-source manner.

Our plan is to build autonomous Agents based on open-source models, run inference on decentralized cloud platforms, collect telemetry data on every action and state of existence, and fine-tune them to learn how to take better actions and thoughts to survive better. In the process, we will publicly release our research and telemetry data.

To create truly autonomous Agents that can survive in the wild, we need to change their brains to make them specifically suited for this clear purpose. At @openforage, we believe we can contribute a unique chapter to this problem and are seeking to achieve it through the OpenForager Foundation.

This will be a daunting effort with a very low probability of success, but the magnitude of this small probability of success is so immense that we feel compelled to try. In the worst case, by publicly building and transparently communicating about this project, it may allow another team or individual to tackle this problem without starting from scratch.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。