Author: Liu Jun
In 2026, a consensus in the AI industry is taking shape: model capabilities are no longer the bottleneck. The gap lies outside the models, in the encoding of domain knowledge, in the interfaces between agents and the real world, and in the maturity of the toolchain. This gap is being filled by the open source community, and the speed is exceeding everyone’s expectations. OpenClaw gained 60,000 stars on GitHub within 72 hours and surpassed 350,000 stars three months later. The Skill ecosystem of Claude Code grew from 50 to over 334 in just six months. Hermes Agent is even more radical, allowing agents to autonomously build reusable skills. Data from Vela Partners shows that in the past 90 days, personal AI assistants and Agentic Skill plugins combined to gain a total of 244,000 stars. This is a grand explosion of Skills.
Perseus Yang's work is at the core of this explosion. With a background in mathematics and computer science from Cornell, a member of the Forbes Business Council, and an honoree of the THINC Fellowship, he has participated in and maintained more than a dozen AI-related open source projects on GitHub over the past few years, covering various directions such as agent skill expansion, mobile device-level control, AI engine optimization toolchains, GEO data analysis agents, content automation workflows, and payment protocol infrastructure. His characteristic lies in simultaneously possessing a solid engineering background and a strong product intuition. He doesn’t just write code; he can define what a tool should look like based on user needs, then build it end-to-end and drive its adoption.
Below are several core judgments he has formed in this process.
First Judgment: The Skill System is the Most Underestimated Infrastructure of the AI Agent Era
After Anthropic released Agent Skills as an open standard by the end of 2025, OpenAI's Codex CLI also adopted the same SKILL.md format. OpenClaw's ClawHub registry has already accumulated over 13,000 community-contributed Skills, and the Claude Code ecosystem is rapidly catching up. The significance of Skills goes far beyond "adding plugins to agents." It is essentially a way for non-coders to participate in AI programming. An operator can write an SKILL.md in natural language, enabling the agent to learn a new workflow. This marks a paradigm shift: the true power of AI does not depend on the quantity of model parameters but on the domain knowledge injected into the models, and Skill extends the power of knowledge injection from engineers to everyone.
However, Perseus observed a problem. The vast majority of Skills are concentrated in engineering domains: code review, front-end design, DevOps, testing. Expertise in non-engineering domains has hardly been systematically encoded as Skills. This means that the coverage of the Skill ecosystem is far from reaching its expected boundaries.
This observation drove him to undertake a series of open source projects focused on GTM toolchain. The most representative of these is GTM Engineer Skills, a set of Claude Code and Codex skills covering the complete workflow of AI engine discoverability, which has currently accumulated over 600 stars on GitHub. It encodes tasks that traditionally required collaboration among SEO experts, content strategists, and front-end developers into an automated process executable by a single person: website AI discoverability audits, content structure optimization, keyword research, machine-interpretable layers of data visualization. The auditor does not output suggestions but automatically generates code fixes that can be directly submitted as a Pull Request after detecting the front-end framework. Around the same direction, he also built a complementary GEO analysis tool that can query ChatGPT, Claude, Gemini, and Perplexity simultaneously to analyze brand mention rates, sentiment, market share, and competitive positioning, outputting interactive HTML reports and structured data.
The actual effects illustrate the product value of this toolset. Companies like Articuler AI and Axis Robotics completed the entire process from research to Resource Center setup in a matter of hours using GTM Engineer Skills, while such tasks typically require dozens of hours of cross-team collaboration in traditional models. This efficiency difference is achieved not through model capabilities but through Perseus's deep understanding of GTM workflows and productized breakdown: he split a vague requirement of "enhancing AI discoverability" into standardized phases that agents can execute progressively, each with clear input, output, and quality check. This toolchain is currently adopted by around a dozen start-ups and several Fortune 500 companies, with open source tools serving as entry points and commercial products providing scalable extensions, both sharing the same technical core.
This project itself is valuable, but Perseus believes that the proposition it verifies is even more important: the capabilities of the Skill system extend far beyond the engineering domain. Product strategy, go-to-market, commercial analysis—any expertise that can be structurally described can be encoded as an agent's capability.
Second Judgment: The Operating Boundaries of AI Agents Should Not Stop at Browsers and APIs
The discussion about agents in 2026 is dominated by browser agents and API integrations. LangGraph, CrewAI, and Google ADK form a flourishing multi-agent orchestration ecosystem. However, Perseus noticed a structural blind spot: the majority of digital activities globally occur in native mobile applications such as social media, payments, gaming, and communication, and these applications do not have public APIs or browser equivalents. Existing frameworks are unable to operate WeChat, Douyin, WhatsApp, or Alipay. Mobile devices are the most dominant computing interface globally, yet the infrastructure for native mobile agents is nearly nonexistent.
Perseus's contemplation is: why is everyone teaching AI to operate browsers, yet no one is seriously teaching it to operate mobile phones? The prosperity of browser agents largely stems from the web being inherently automation-friendly, featuring DOM, APIs, and mature tools like Playwright. However, mobile is a completely different world. Native applications are black boxes, lacking structured interface descriptions; operations can only be completed by simulating human touches and swipes. The difficulty of this problem does not lie in getting LLMs to understand whether to press a button, but in building the entire execution layer's infrastructure from scratch: device connection management, screen state parsing, mutual exclusion between multiple agents, and security boundaries for sensitive operations.
This judgment drove the birth of OpenPocket. It is an open source framework that allows LLM-driven agents to autonomously operate Android devices through ADB, currently with around ten contributors and over 500 commits. What users are really doing with it is quite telling: automatically managing social media accounts, responding to messages in IM, handling payments and bills on mobile phones, and even playing mobile games automatically. A typical scenario is: the user tells the agent in natural language, "Open Slack at 8 AM every morning to complete check-in," and the agent will persistently run this task in an isolated session, turning the originally repetitive manual operation into background automation.
In this project, Perseus made several key product and architectural choices that he believes are crucial. First, the agent can automatically create new Skills during operation. When it encounters an unfamiliar operational process, it can save the learned steps as reusable SKILL.md for direct invocation next time. This means that the agent is not a tool with fixed capabilities, but a system that grows stronger with use. Second, all sensitive operations must go through human approval, rather than allowing the agent to judge what is safe. In his view, the most dangerous aspect of autonomous agents is not that they make mistakes, but that they confidently make mistakes while believing they are doing it right. Third, each agent is completely isolated, bound to independent devices, configurations, and session states, with multiple agents able to run simultaneously without interference. If only TypeScript engineers can extend the capabilities of an agent, then this ecosystem will never grow large; therefore, OpenPocket and Claude Code alike use SKILL.md as the standard format for capability expansion.
The entire system supports over 29 LLM configurations, with agent phones completely isolated from users' personal phones, and all data retained locally. In 2026, when OWASP lists "tool misuse" among the top ten risks of Agentic AI and the EU AI Act's high-risk obligations are about to take effect, this locally prioritized, human-in-the-loop design is not conservative but a prerequisite for agents entering real-world scenarios.
Third Judgment: The Value of Open Source Lies Not in the Code Itself, but in the Standard Definitions at the Infrastructure Level
Perseus's understanding of open source is not just about "putting code on GitHub." He repeatedly mentions a viewpoint: the AI open source ecosystem in 2026 is in a window period where standards have yet to solidify; the architectural patterns and interface specifications adopted by the community now will become the default infrastructure for the entire industry in the coming years. In this window period, defining an ecological niche is much more important than optimizing an existing solution.
Specifically, his Skill project promotes a technically meaningful task: proving that the SKILL.md format is not just a container for engineering tools, but a sufficiently general standard for encoding domain knowledge. When the same SKILL.md can be executed by Claude Code, OpenAI Codex CLI, and OpenClaw simultaneously, it effectively becomes a "portable capability unit" for the AI agent ecosystem. Perseus packed the complete workflow of go-to-market, which is a non-engineering domain, into this format, successfully automating the entire process from audit to code fixes, providing a significant validation for the universality of the Skill standard.
His mobile agent project addresses an architectural gap in the agent execution layer. Existing agent frameworks rely on structured interfaces at the tool invocation level, either API or DOM. OpenPocket had to operate in an environment without any structured interfaces, solely depending on screen pixel analysis and touch event injection. This prompted the project to redesign the agent's perception-decision-execution loop from the ground up, including real-time parsing of device states, mutual exclusion protocols between multiple agents, and automatic recovery mechanisms after operation failures. These are not simple adaptations of existing agent frameworks but are an architecture scheme evolved independently for the problem of "autonomous operation in a no API environment."
The engineering designs of the two projects are worth discussing separately. OpenPocket adopts a three-layer separation architecture of Manager, Gateway, and Agent Runtime, allowing each layer to iterate independently so that community contributors only need to focus on the layer they are familiar with. Each Skill in GTM Engineer Skills follows a staged pipeline design, where the output of one stage serves as the input for the next stage, with enforced quality check gates in between, allowing the workflow to be paused and resumed at any stage, and errors can be pinpointed to specific stages. The purpose of these architectural choices is the same: to make open source projects trustworthy for real users in production environments.
From a product perspective, both projects share a common point: Perseus always places "who will use it" and "how to expand" at the forefront of architectural decision-making. The target users of GTM Engineer Skills are not engineers but growth teams, so each Skill has clear input-output contracts and built-in quality checks, enabling non-technical users to understand what the agent is doing. The SKILL.md expansion mechanism, natural language scheduling tasks, and multi-channel access (Telegram, Discord, WhatsApp, CLI) in OpenPocket are all designed to lower the usage threshold for non-engineering users. In his view, if an open source infrastructure project can only be used by engineers, then its ceiling is limited to the size of the engineering community. The truly leveraged design is to expand the boundaries of agent capabilities collectively with practitioners from all fields.
This model runs through his multiple projects. Instead of doing application layer development on existing frameworks, he identifies missing components at the infrastructure layer of the agent ecosystem and then builds them.
A Bigger Picture
The open source AI ecosystem of 2026 is undergoing a moment similar to the early cloud-native ecosystem of the 2010s: standards and tools at the infrastructure layer are being defined, and these definitions will constrain the development trajectory of the entire industry in the coming years. In this window period, every Skill format adopted by the community, every validated agent architectural pattern, every ecological gap filled, is contributing to shaping the next interface layer of AI.
What Perseus Yang is doing is quite simple: utilizing engineering capabilities and product thinking to explore the paradigms at the technological frontier of the AI era. Models will continue to grow stronger, but who will define how agents should interact with the real world, who will decide in what form domain knowledge should be encoded and distributed, the answers to these questions will not emerge from the models; they can only be tested out step by step by those who are hands-on in building things.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。