Outside the model belongs to Harness: Why has the main battleground for domestic AI competition changed after Deepseek?

CN
PANews
Follow
2 hours ago

In mid to late May 2026, Deepseek formed a new Harness team internally, focusing on code intelligent agent products, benchmarking against Claude Code under Anthropic. Former Jane Street star quant engineer Cui Tianyi joined the team in March, and senior researcher Chen Deli publicly confirmed and was responsible for recruiting. Deepseek's job description clearly states a formula: “Model + Harness = Agent.” As the capabilities of fundamental large models gradually level out, the era of merely competing on parameters is coming to an end. Deepseek personally set up a toolchain team, marking the main battleground of domestic AI competition shifting from “building large models” to “creating toolchains and implementing office solutions.”

Why is Deepseek personally getting involved in Harness?

For a long time, developers' expectations of Deepseek remained focused on open-sourcing more powerful foundational models. However, strong coding capabilities do not equate to developers using them as productivity tools. What truly changes the way we work is not the code responses in a chat box, but the engineering intelligent agents capable of entering terminals, understanding projects, reading and writing files, executing commands, and fixing errors. Before the official move, the developer community had already created various open-source terminal Agents based on Deepseek's model. Deepseek is now forming the Harness team to master the interface design rights and training data closed loop, incorporating the community's path into the official core product.

To understand this strategic intent, one must first clarify what Harness actually is. For readers without a technical background, the term “Harness” may seem unfamiliar. In Deepseek's formula, the model is responsible for reasoning, while Harness takes care of everything else. Originally in the engineering field, Harness meant “harness” or “safety belt”; when extended to the AI field, it refers to the “runtime infrastructure” of an Agent.

To understand this more simply, we can compare the large model to a high-IQ worker's “brain” and “intelligence,” while Harness serves as the "job description, KPI assessment standards, office safety barriers, and toolbox" for this worker. It is not a “scaffolding” assembled before operation, nor is it a “framework” providing building blocks; rather, it is a continuously operating system. It orchestrates execution loops, distributes tool calls, manages context, performs safety checks, and is responsible for error recovery and state persistence. The large model itself is stateless and lacks environmental interaction capabilities; it can only accept text inputs and output text. Harness compensates for these shortcomings, allowing the model to truly interact with the external world and perform specific tasks.

Why must foundational model companies personally master this runtime? The core issue is that the Agent product is not only an outlet for model capabilities but also a training ground for those capabilities. Deepseek's JD emphasizes “achieving the co-evolution of the model and Harness.” In real complex tasks, the model will encounter various failures due to environmental constraints and tool return anomalies. Harness records these failure trajectories, providing feedback to model training and creating a flywheel effect. If the community were to build it, model vendors would lose the most crucial application layer data feedback, reducing them to mere providers of computing power and weights.

From an engineering perspective, optimizing Harness can more decisively determine the success or failure of the Agent than merely optimizing Prompts. According to technical experts, in Agent execution, tool outputs account for 67.6% of what the Agent actually sees in context, while system prompts only account for 3.4%. This means that most of the model's “view” is occupied by the results of tool calls. If Harness mishandles the format of tool outputs or fails to effectively compress redundant information, the model may fall into “context decay,” leading to a sharp decline in subsequent reasoning quality.

An even more critical issue is the composite error problem. An Agent process consisting of 10 steps, each with a reliability of 99%, has an end-to-end success rate of about 90%; when task complexity increases to 50 steps, the success rate plummets to 60%. In real codebase maintenance or enterprise office automation scenarios, continuous operations with dozens of steps are the norm. At this point, no matter how strong the model's reasoning ability, it cannot compensate for the probabilistic cumulative losses. Only through error handling and recovery mechanisms in Harness can retries or path corrections be made when steps fail. This is precisely the engineering value of Harness and the reason Deepseek must personally get involved.

Tencent as the connector, Alibaba as the frontend penetrator: Differentiated paths of major companies' toolchains

Deepseek's shift is not an isolated case. According to industry media reports, enhancing Agent capabilities has become an important development direction for domestic foundational large models in 2026. Basic models are gradually becoming “utilities,” with the competitive battleground shifting to the application layer. Other major domestic companies are also seeking differentiated positioning through toolchains, but the paths vary, reflecting each company's ecological endowments and target users' differences.

Tencent launched a new corporate Agent initiative in June 2026 with WorkBuddy Enterprise Edition. Its core positioning is as an all-scenario workplace intelligent desktop platform, primarily focusing on transitioning from individual efficiency to organizational collaboration. WorkBuddy Enterprise Edition supports multiple Agents running in parallel and connecting to business system Connectors, attempting to seize a unified AI office entry point. Tencent's positioning logic relies on its massive WeChat Work and Tencent Cloud ecosystem. For large enterprises, the pain points of AI office do not lie in the extreme experience of single-point tools, but in whether it can integrate isolated internal office systems. By acting as a connector, Tencent enables Agents to directly schedule corporate data and processes, emphasizing organizational-level collaboration and complex task delivery. The advantage of this path lies in the high barrier; once integrated into core business processes, the cost of replacement is huge. The challenge is a need for strong enterprise service capabilities and customized support.

Alibaba, on the other hand, took a different approach by lowering the automation barrier on the Web side. Alibaba open-sourced the PageAgent framework, a pure frontend browser-based GUI Agent. This framework requires no backend deployment and can integrate AI operator capabilities into websites with a single line of code. Alibaba's positioning logic is about empowering web developers, allowing any webpage to instantly transform into an AI-native application. Given that many traditional enterprise systems cannot provide API interfaces, achieving automation through frontend DOM operations is a pragmatic approach to a downward attack. The advantages of this path are its lightweight, easy integration, and ability to quickly cover a vast number of long-tail websites; however, frequent changes in frontend DOM structures may also pose stability challenges, placing higher demands on Harness's error recovery capabilities.

In comparison, companies are no longer simply competing on model scores but are building toolchains based on their ecological endowments. Tencent acts as a connector, Alibaba as a frontend penetrator, and Deepseek approaches from the most essential code engineering scenario for developers. This differentiation indicates that the domestic AI industry has recognized that there is no perfect universal Agent, but rather vertical solutions refined through robust Harness engineering in specific scenarios. For enterprise procurement, choosing a particular toolchain essentially means selecting a specific automation path: whether to deeply bind office ecosystems, flexibly embed into existing web systems, or empower developers' engineering workflows.

Viktor's $20 million ARR proof: Enterprises are willing to pay for autonomous execution

The maturation of toolchains is changing the paradigm of AI's participation in the office domain. The logic of the native Copilot is “drafting and waiting for human completion,” where AI generates a piece of text or code, but the final step still requires human intervention for modification and execution. In this model, AI serves merely as an efficiency tool and cannot truly replace labor. Corporate employees need to constantly monitor AI outputs for verification and implementation, which actually increases cognitive burden.

Clear signals of paradigm shifts have already emerged in overseas markets. As an overseas trend reference, Polish AI office automation company Viktor positions itself as an AI employee within Slack and has achieved an annualized revenue (ARR) of $20 million without a sales team, serving 30,000 enterprises, and secured $75 million in Series A financing in May 2026. Viktor’s model represents the ultimate form of a new AI employee: possessing cloud-based computers that can work continuously for long periods, firmly grasping vast contexts, and delivering results directly.

Viktor is positioned as a Tier 3 AI Coworker, meaning it handles not just simple Q&A, but complex tasks such as marketing audits, advertisement management, and lead research that require multiple steps and long-duration operations. The enterprise side exhibits a strong willingness to pay for such AI that does not require human final confirmation and can work continuously for long periods. The explosion of this commercial data proves that the value anchor of office automation has shifted from “assisted generation” to “autonomous execution.”

Domestic vendors laying out Harness and Agent toolchains aim to seize this trend. When Harness can provide sufficient safety barriers, state persistence, and error recovery capabilities, AI can transition from being a “trainee” that requires constant human monitoring to an “outsourcer” that can independently deliver work results. Enterprises' focus in procurement will shift from the size of model parameters to whether the Agent can operate stably for 8 hours without crashing, and whether it can automatically handle API rate limits and webpage structure changes. For developers, this means that the focus on building AI applications will shift from “how to write a good Prompt” to “how to design a robust runtime environment.”

Token explosion and the engineering barrier of “thick framework”

After shifting to toolchain competition, the challenges faced by enterprise procurement and developers in practical implementation have not diminished; rather, they have become more focused on engineering aspects.

First and foremost is the Token explosion problem. Long-running Agents in the “thinking, acting, feedback” loop can easily experience rapid context inflation due to redundant tool outputs. The developer community widely discusses this challenge, believing that it not only drives up inference costs but also causes the model’s attention to be dispersed, resulting in a sharp increase in task failure rates. For instance, when executing a webpage data scraping task, if Harness indiscriminately inserts the entire HTML source of a webpage into context, the model can quickly get lost in the redundant information and forget the original task objective. Hence, the context compression and memory management capability of Harness becomes a core consideration for enterprise procurement. An excellent Harness must know which historical information can be discarded and which tool return results need summarization, testing profound engineering architecture capabilities rather than the intelligence of the model itself.

This also prompted developers to be wary of “shelling” thin frameworks. If the Harness offered by large model vendors is merely a simple API encapsulation, providing only basic conversation windows and tool call interfaces, it will lack practical debugging value. The vulnerability in production environments demands that Harness possesses “thick framework” characteristics such as sandbox isolation, fine-grained permission control, and breakpoint resumption. Only a runtime with deep engineering barriers can truly address enterprise-level application stability needs. For example, in a code execution scenario, Harness must provide a secure sandbox environment to prevent malicious code generated by the model from damaging the host system; in long-running tasks, it must support breakpoint resumption to avoid having to restart the entire task due to network fluctuations.

Additionally, geopolitical factors have left a significant market vacuum for domestic Harness solutions. Top overseas engineering intelligent products like Claude Code implement access restrictions for mainland China and Chinese-funded enterprises. Domestic developers, unable to directly use these top tools, can only seek domestic alternatives. Deepseek's establishment of the Harness team is not only a response to technological trends but also a response to this huge replacement demand.

For enterprise procurement and developers, understanding the value of Harness means no longer being deceived by flashy dialogue demonstrations when selecting AI products, but instead questioning what its error recovery mechanism is, what its context management strategy is, and whether it can truly integrate into existing workflows. During the toolchain competition phase, enterprises should prioritize evaluating vendors' engineering delivery capabilities and ecological compatibility rather than simply comparing model scores; developers should focus on the openness of the Harness framework and the completeness of the debugging toolchain, choosing platforms that can provide deep controllable runtimes.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink