a16z: OpenAI will not kill all application layer opportunities, let go of your AI anxiety.

Will OpenAI kill all AI applications? a16z: You are going down the wrong path.

Author: Joe Schmidt IV

Compiled by: Deep Tide TechFlow

Deep Tide Overview: What is the biggest anxiety of AI entrepreneurs? Will OpenAI and Anthropic kill all opportunities at the application layer? a16z partners provide an answer using the "Yellow Brick Road" theory: large model labs will only dominate horizontal, single-step tasks; the real opportunities lie in vertical scenarios, multi-step workflows, and fields with strict compliance requirements. This article is worth reading for both AI entrepreneurs and investors.

I have recently been repeatedly asked by founders and potential employees a question: What else can be done at the application layer of AI, or will OpenAI and Anthropic kill everything?

This question is rooted in a specific type of AI anxiety. Some people conclude that the only way to avoid becoming permanently bottom-tier is to either stay within large labs or work in frontier fields like robotics or hard tech—basically anything "the lab can't touch." If every piece of software is going to be consumed—either directly absorbed by Codex or Claude, or rendered unnecessary by future models—then just run!

Look, I am an AI maximalist like almost everyone else, and I think they are half right. Labs are indeed consuming huge application surfaces. But the "application layer" is not a homogeneous opportunity. The right framework is whether you are on the Yellow Brick Road or somewhere else in the land of Oz.

The Yellow Brick Road is our shorthand for the path that labs are taking, where they are investing enormous resources. Labs are best suited to solve problems like code generation, writing, or image creation because these problems improve with the enhanced capabilities of the foundational models: every dollar spent on pre-training and post-training improves product quality. Meanwhile, elsewhere in the land of Oz reside more complex, often verticalized issues that are not as simple as providing business users with a horizontal tool along with standard tools and computer usage. The value comes more from the scaffolding around the models, which makes outputs reliable, compliant, and actionable within specific industries, rather than just the raw capabilities of the underlying models (though that remains important!).

We are seeing this in real-time, as OpenAI and Anthropic are essentially telling the market that they cannot solve all problems with generic AI colleagues. They have announced large-scale pre-deployment joint ventures to build full companies around configuring and customizing their models for enterprises. If you think the next model release will solve the problems, you wouldn’t be pouring billions into those projects.

So if you want to get rich by building AI applications—stay off the Yellow Brick Road and build elsewhere in the land of Oz. Here’s what we’ve learned, as well as some insights from our portfolio founders about what works.

Yellow Brick Road

If you are starting a company, the Yellow Brick Road is the most obvious path, but also the most dangerous. Take a high-performance model, plug in some off-the-shelf connectors (like G Drive, Slack, Salesforce, Notion, GitHub), and then launch some kind of intelligent orchestration layer on top. How magical!

The problem is that this is exactly what labs are doing with Cowork and Codex. Clearly, they have the models, which gives them better margins, control, and the ability to exert pricing power over any downstream users. But perhaps most importantly, they also have the architectural choices that define what their products are best at solving. So far, they have been cautious about model-layer tool invocation patterns, which are precisely what’s needed for the horizontal low-step tasks on the road. Even if a startup could somehow surpass Codex or Claude Code, the labs have immense distribution channels and the biggest brand halo in the AI space.

If you are an AI application company running this play with the same connectors, with no sub-agents or configurations below, and no distribution channels, you are likely walking a path to nowhere.

Elsewhere in the Land of Oz

It’s not all doom and gloom for startups. There are massive opportunities beyond the Yellow Brick Road, where startups have a clear path to own their customers and solve complex problems.

These companies are building intelligent agent experiences where models are woven into a complex web of tools, automation, and integration (in other words: software), leading these startups by default to be verticalized. They can focus on multi-step and multi-party workflows, using sub-agents tailored to specific tasks for roles and verticals that Anthropic and OpenAI’s horizontal platforms cannot reach: cross-system gathering of context, then routing to multiple individuals who must approve at different stages. This often involves one or more legacy systems, which tend to require deterministic results, not accepting ambiguity, sometimes tied to certain valuable business outcomes. Labs understand how valuable these problems are: this is why they are building their own outsourcing configuration shops, and it's also why there is a whole category of high-end reinforcement learning business.

Why Elsewhere in the Land of Oz Will Not Be Owned by the Wizard

The response to the points above is that so far, betting on models/labs not improving has been a pretty bad trade. They will likely continue to get better and eventually eat into the market for these application layer business services.

The labs will certainly improve, but I think there are several ways companies elsewhere in the land of Oz can protect themselves over time:

Data and Learning Flywheels:

Much of what you internalize is not in any training set—unwritten industry norms, undocumented standards, tribal knowledge residing in practitioners' minds. None of this is on the open web. No amount of training compute can replace being in the workflows where that knowledge actually exists. Here are two overlapping flywheels: one is cross-client—patterns accumulated when you see more variations of the same problem—and the other is within the client—the reasons behind specific decisions, unspoken exceptions, company-specific heuristics, which can only emerge through real interactions with the system.

Even if client data cannot be used across clients, application companies can leverage pattern recognition across client types and use it to provide the right scaffolding for future issues. If a company has already run its agents through a hundred legal revisions, a thousand insurance underwriting loops, or ten thousand SDR activities, it has internalized the shape of the problem in a way that the next entrant cannot replicate, even when starting a brand new agent for the first time.

Horizontal agents could theoretically build the same learning infrastructure. The reason they don’t, aside from pure focus, is user experience: capturing that knowledge completely depends on the workflow interface you give users, while vertical players can shape these interfaces around what needs to emerge in their workflows. Horizontal tools can't do this. Evaluation sets, tagging outputs, and edge case taxonomies can accumulate into vertical-specific data flywheels, fueling fine-tuning, which the next entrant cannot generate without comparable production exposure. Whether this is possible depends on data rights, accumulated production exposure, and the structure of client contracts; but regardless, pattern recognition will accumulate.

Managing Model Variability and Complexity: Labs are already routing internally—using different categories of models for different requests, with underlying integration. What they cannot do is route across vendors, assess competitors' models for specific sub-tasks, or use open-source fine-tuned models for the best niche parts of a task. Companies elsewhere in the land of Oz pick the right models for each sub-task in the entire model market, not just whatever their parent lab releases. They also do the work that no one wants to do—running re-evaluations on upgrades, recalibrating prompts for clients' edge cases, deploying without breaking production—every time a new model is released. Labs won't do this on behalf of the clients; they sell you the next model and tell you to migrate. Companies elsewhere in the land of Oz absorb the migration work. What clients get is the best intelligence on the market, plus continuity with every upgrade.

Cost Optimization: Running every query through Opus 4.7 is the fastest path to negative gross margins. The best companies elsewhere in the land of Oz route across model layers—the cutting-edge models handle the hardest tasks, mid-tier models do most of the work, and smaller custom or fine-tuned models are used where they have earned the right to use. Some are now post-training their own models on this basis, optimizing for the narrow work segments that matter to the clients, delivering at a fraction of the cost of cutting-edge API calls. Labs price to the bottom line: the least intelligence offered for X dollars. Companies elsewhere in the land of Oz sell the opposite—the minimum dollar cost of the specific level of intelligence that workflows actually need. This is only possible if you know exactly what level each sub-task needs, which labs structurally cannot know across every vertical. This directly translates into lower, controllable outcome prices.

Governance: Becoming the control plane for clients running AI in that vertical has significant value—this is where permissions, audits, what agents are allowed to do, and what agents actually did all converge. This control plane is built from guardrails for specific use cases, which look completely different across industries and job types. Because they own the tools, workflows, and data contacts end to end, they can deliver deterministic outcomes in ways that horizontal tools find hard to match. They also absorb the regulatory complexity for end buyers—legal domain FRCP and bar association rules, healthcare HIPAA, financial SEC and FINRA, state insurance regulations, etc. Horizontal players cannot credibly do this unless they simultaneously become a hundred different verticals. CIOs want a partner that will declare in the contract that they are handling the compliance of the intelligence provided.

All of this comes back to the same thing: focus. It can be a vertical space (insurance, legal, accounting) or a deeply executed function (sales, customer support, finance). Either way, this work requires a team laser-focused on a customer segment—their workflows, edge cases, regulations. Labs are not built for this. They must be everywhere and serve everyone, which is how they built the Yellow Brick Road in the first place. The same trade-offs prevent them from entering elsewhere in the land of Oz—you can either be everywhere at once, or excel at one thing. You cannot have both.

Sales as an Example—Practical Advice from an 11x Tech CEO

How should you think about this in practice? Here are some practical tips from 11x CEO Prabhav Jain.

Focus on Outcomes

The tactical path to building a resilient company in the face of labs starts with the specific outcomes that your clients truly care about. For us, that is helping companies generate more sales leads. From there, the questions become tactical. What actual activities drive sales leads end-to-end? Break down each activity into tasks. Which tasks are automatable and which are not? Which require deep domain insight and which do not? Labs may also release workflows, but when workflows have many steps, chaotic inputs, hard-to-explain states, or real-world constraints, merely having a better model will not reach your goals. The work falls back to old-school software engineering, where labs have no advantage over focused application companies on this surface. For instance, here are some tasks we deal with, some of which are automatable, some not: lead mining based on custom signals, lead enrichment, deep account research, context gathering from CRM, specific channel messaging, lead qualification agents, and email deliverability systems. These are not tasks you can complete all at once; they require deep engineering.

The key insight in the land of Oz analogy is that roughly half of the non-automated parts of any real workflow do not benefit from lab advantages. They are not better than you at writing deterministic software below the model layer. And the automatable half still requires you to fine-tune, train, and constrain the models for the actual outcomes desired. Domain knowledge often does not exist in the generic training data. These skills are built from scratch for verticals or functions and input into the model at the correct moments in the workflow. When our agents qualify inbound leads over the phone, I must train for a good sales dialogue specific to that industry and that role. This is the work of application companies, and it will compound.

More importantly, these skills will constantly go out of date as businesses continue to evolve. Thus, your ability to keep these workflows and contexts evolving is the true competitive advantage. For example, when we launched our scaled email outreach product, "AI" written emails started to appear. Fast forward to today, and people have developed a keen discernment between emails written by AI and those written by humans, and that discernment changes every few months. Our agents must constantly adapt to market dynamics, but this is precisely where the moat is built. In fact, despite a constantly changing market, our positive response rates have quadrupled over the past few months, generating hundreds of millions in sales opportunities for our clients.

Focus on High Complexity Problems

Complex problems are where real business value gets unlocked. Otherwise, you will find yourself just building a thin packaging layer.

Break down any sufficiently complex business problem, and the chaos will soon emerge. Here’s an example from the GTM space that sounds simple: If a company is already a client, you should not contact contacts from that company again. But the reality is far from simple. Perhaps your CRM has the domain name of the company. What about a company with dozens of subsidiaries? What if the parent company’s domain is in the CRM record? What if a stale match field in Salesforce sends cold emails to the CRO of an existing client? Real-world data is chaotic. Humans struggle to manage it as well. Models do not magically leap over this hurdle. To sift order from chaos requires intelligent agents designed specifically for the particular shape of the problem, not a generic co-pilot pointing at the CRM. In fact, based on our data, we realize our data quality and freshness far exceed those of clients, so we often default to our data.

Guardrails are not just to prevent bad things from happening. That's why clients are paying you.

Guardrails are severely underestimated. Even within the same product, each use case requires its own guardrails. For us, a regulated financial services client needs entirely different assurances compared to a mid-market SaaS client. These assurances permeate how the agent writes content, who it can contact, what data it can access, what it can say on calls, and how every decision gets recorded.

One-size-fits-all systems will crumble in the face of these differences. Guardrails must be built by use case, configured by client, and continuously audited. This work falls entirely on application companies. That’s why we have full-time deployment engineers (FDE) and technical deployment strategists that need to tune for each client’s needs. For example, we partnered with a Fortune 1000 firm to conduct consent-based outbound calling to their vast SMB client base. The initial rounds of iterations had very low answer rates—we had to iterate quickly, learning how to engage this specific audience within the first 10 seconds of the call. SMB owners behave radically differently than large B2B buyers or consumers. We now create more sales opportunities for them in a day than their entire sales team does in a month within that segment.

Insurance as an Example—Practical Advice from FurtherAI CEO

Sales is one example. Insurance is another, illustrating the same point from a different angle. Here are the thoughts of FurtherAI CEO Aman Gour on building "outside the road":

When we started deploying AI in real insurance businesses, we kept hearing a specific assumption: models are the intelligence, and workflows are just the scaffolding around them.

After working with more and more insurance companies, we have become increasingly convinced that this view is incorrect.

In the insurance industry, much of the intelligence actually exists within the workflows themselves. Two insurance companies may take a submission through what appears to be the same pathway: submission, review, quoting, underwriting. But the pathway is the simple part. What distinguishes the two companies is everything within the pathway: which risks need reporting, which loss signals are important, which of the two risk preference rules takes priority when they conflict, when manual signatures are needed, which external data to call, and how final decisions are recorded.

These logics do not exist in a clean rule engine. They are scattered across standard operating procedures, managerial reviews, underwriting philosophies, company-specific risk preferences, and years of operational experience. Much of this content is not recorded in a format that models can read directly.

This is why we do not believe in pure agents that reason from scratch every time, nor do we believe in rigid workflows that collapse in the face of chaotic realities. What we are building is agentified workflows. Workflows give you repeatability, auditability, and cost control. Agents handle variability and recover when the ideal path is interrupted. Humans maintain involvement in the judgment phases where accountability is needed.

On day one, this will automate manual work. But over time, every submission becomes a signal, every exception is feedback, and every human correction reveals the incompleteness of the operation manual. Over time, workflows stop being scripts and start becoming the operational memory of the insurance company. This is the part that labs find hard to reach. They will continue to release better models and better generic agents, which is fine. But they will not stay long enough in the production workflows of insurance companies to understand why a particular account got flagged, why a specific risk got declined, or why an underwriter overturned a risk preference guideline and was right.

This understanding can only come from running the workflows in production thousands of times. The workflows you deliver on day one are not the moat. The loops created by production use over time are.

For us, this is what it means to build "outside the road".

How to Determine If You Are Elsewhere in the Land of Oz?

Tool and Step Testing: How many steps does this work require, and how complex must the tools be to support it? Compare it to horizontal AI searches conducted on Google Drive—one step matches one tool, yielding high outcome tolerances; if the user finishes reading a summary and it isn't right, they can ask again—and a multi-step legal revision based on three years of law firm precedents: dozens of steps spanning multiple tools, outputs must be approved by partners, potentially needing to be defended in court. Both look like "agents doing work," but only the latter requires a focused team to spend years building deep software.

System Testing: Are you building a system for the client to run work, or a tool sitting on top of the client’s existing systems? Systems end-to-end own the workflows—data capture, governance, completion records—they are what clients point to when describing how actual work occurs. Tools merely add intelligence to the workflows clients are already running. Tool scenarios generate real revenue, but labs can steal it because clients do not rely on you as the orchestration layer. High ACV is usually a signal of a system because the system replaces real human labor and gets compensated for it, but this is not a guarantee. Ask yourself, if a lab releases something that claims to directly compete with you, would clients still need your tool? If yes, you are building a system. If no, you are merely a tool—even if your ACV is high.

Hedge Fund/P&L Testing: Lab performance is judged based on benchmarks, while the performance of "elsewhere in the land of Oz" is judged based on the client’s profit and loss statements. Your clients do not care how your model scores on SWE-Bench or MMLU—they care whether your agent closed deals, accurately revised contracts, or underwrote the right policies. If they focus on outcomes from specific workflows rather than generalized capability scores, you are in "elsewhere in the land of Oz." If they pay for generalized capability, what you sell them can be gotten through a subscription to Claude or Codex. The best agent businesses need to execute like hedge funds—winning with alpha in the client’s P&L, not benchmark scores.

Both Can (and Will) Win

We will see huge winners both on the "Yellow Brick Road" and elsewhere. Models will continue to win because they own the models and have the distribution channels for the horizontal tools they design.

"Elsewhere in the Land of Oz" can win if they have work systems—the interfaces through which companies actually conduct work—and the data flowing through and captured from these systems. These companies have data capture, workflow action systems, and governance. As more complex workflows mature in vertical fields, they will compound into a core experience that clients rely on. When the next generation of models is released from existing players and newcomers, the companies will become the layer that integrates and delivers them to clients. The underlying models are replaceable; the work systems are not.

The next generation of enterprise software will be built outside the road.

If you are building it, please contact: jschmidt@a16z.com.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。