Token Budget War: Enterprise AI Enters the "Accounting Era"

CN
2 hours ago

Original title: Token Budget Wars

Original author: Jaya Gupta

Original translation: Peggy

Editor's note: Corporate AI is moving from "whether to adopt" to "how to account for it."

Over the past two years, many companies have pushed employees to use AI, mainly to keep up with technology trends and competitive pressures. But as the costs of AI reasoning shift from experimental budgets to ongoing operational expenses, CEOs and CFOs are asking a more pressing question: how much value does AI truly create? What actual results come from every dollar of token costs?

This is the core of the "Token Budget Wars." The so-called token budget wars are not just about companies wanting to lower AI bills, but about reassessing which business areas warrant more computational power, which tasks should shift to cheaper models, which processes can replace outsourcing or human labor, and which are merely ineffective expenditures.

The most important point in the article is that the volume of AI usage does not equate to value. In the SaaS era, usage typically meant software adoption; however, in the AI era, token consumption only indicates that "the meter is running." The same workflow may exhibit cost differences of several times due to variations in prompts, contexts, model choices, and retry counts. A rising bill may indicate that AI is genuinely doing work, or it may suggest that the system is engaging in ineffective efforts.

Therefore, the next phase of corporate AI is not just about model capabilities but whether it can correlate token costs with business results. The first phase proved that AI could accomplish tasks; the second phase must address whether these tasks are worth paying for.

Below is the original text:

Corporate AI has transitioned from "whether to adopt" to "how to allocate."

At the executive level, the new "currency" is your ability to quantify the return on AI investments. Each functional department is asked the same question: What have you produced? What are the costs? Over the past two years, as CEOs woke up to watching Jim Cramer (#bearish) on CNBC while seeing competitors announce productivity gains, they demanded that everyone in the company use AI. The real pressure now comes from the follow-up question: show me the value.

Claude will be released in November 2025, by which time most companies will have already locked in their annual budgets for 2026. By the first quarter, actual enterprise usage has already far exceeded initial plans. The inference cost is no longer just an experimental budget item; it has turned into a recurring operational cost. This has raised a new question: Where is AI genuinely creating value?

This question is difficult to answer because the utility of tokens has not been quantified. The bill cannot tell you whether this expenditure replaced labor, generated revenue, reduced risk, accelerated processes, or if it was merely a group of engineers frantically racking up tokens for the leaderboard (#metamates). When expenditures are only a few hundred thousand dollars, it still feels like an experiment. But once surpassing a certain threshold, such as reaching seven figures, it turns into infrastructure. Technical differences begin to impact the income statement significantly: the same workflow, the same set of inputs, can incur token costs that differ by 5 to 10 times between two runs, with no apparent issues. At the experimental scale, such fluctuations are quite costly; however, once at the infrastructure scale, it is a number that CFOs must explain to CEOs.

This can be termed "marginal token utility": the business value created for every additional dollar spent on inference costs. This is a truly important figure during the scaling phase, and it is one that most companies cannot currently see.

The boardroom discussions are shifting from "Is AI useful?" to "Where does AI actually create leverage?" This is precisely why the so-called token budget war is essentially a battle for the allocation of tokens.

The rapidly heating dispute over token ownership stems from it colliding with a long-standing executive instinct: larger teams mean bigger titles, broader responsibilities, and greater power. In the past, a visible indicator of senior management success was the size of the teams they managed—the number of direct reports, subordinate levels, and personnel within the organizational structure.

However, as intelligence becomes a scarce resource, the new indicator shifts to: how much intelligence can you mobilize?

AI expenditures are essentially competing with labor costs.

Most AI budget requests are fundamentally one of three claims: replacing outsourced labor, replacing internal labor, or generating new revenue.

An employee has a salary. A BPO outsourcing contract has pricing based on work orders, claims, invoices, or audits. Humans can understand these measurement units. However, inference costs are more complex because the ultimately completed cost of a task depends on how the system performs during execution. A claims task that requires three retries, manual adjustments, and utilizes cutting-edge models may end up being more expensive than the outsourced labor it was intended to replace. This is why the discussion is shifting: what is the cost of achieving an outcome? For example, cost per resolved work order, per processed claim, per reviewed contract, per completed invoice, per avoided new position, per retained customer, or cost corresponding to each dollar of revenue conversion.

Executives have realized that BPO is the easiest area to establish benchmarks since these tasks were already priced according to "completed units." In contrast, comparing internal employees with AI is much more challenging because employees do many things daily, including scrolling through TikTok during lunch breaks; productivity increases are often manifested by avoiding hiring or releasing dispersed capacity; and managers are also reluctant to reduce team sizes based solely on partial automation. BPO provides a quantifiable baseline for business teams.

This logic is different from SaaS. SaaS once trained companies to view usage as a proxy for value.

However, AI has disrupted this. The amount of reasoning resources consumed by the same workflow may vary significantly due to prompts, retrieved contexts, chosen models, invoked tools, retry counts, and whether the agent gets stuck. The units on the bill—tokens—are stable, but the workload they represent is not.

More accurately: signals and noise use the same measurement unit. An increase in the token bill may indicate that real work is being done; it may also imply that computational power is being wasted on poor prompts, irrelevant contexts, unnecessary tool calls, repeated reasoning, and overly capable models. Two companies' token bills may be identical, but the underlying business operations may be entirely different: one is turning reasoning into results, while the other is paying for ineffective efforts, even though both situations appear identical in the billing entries.

The usage of SaaS tells you: the software has been adopted. The usage of AI can only tell you: the meter is running. It does not tell you whether the company is genuinely operational.

Why is marginal token utility hard to see?

There are mainly three reasons.

The first is the retry long tail. If an agent has a probability of p of correctly completing a workflow on the first attempt, then the expected token consumption for each resolved workflow will increase roughly according to T/p, where T is the base cost. If the success rate falls from 90% to 70%, the effective cost per issue solved will increase by about 28%, not 20%, due to the compounding effect of failure. In corporate workflows, inputs are often chaotic, and exceptions are significantly important. Failure not only reduces accuracy but also alters the economic equation.

The second is contextual inflation. For operations that heavily rely on attention mechanisms, inference costs grow roughly at O(n²) with the length of the context. Therefore, doubling the context length roughly quadruples the inference cost. Everyone wants the model to grasp enough information, so systems often over-provide: instead of five documents being sufficient, fifty may be retrieved; connectors might dump entire email threads; agents carry outdated conversation history into their continued execution.

The third is routing. When teams are unsure which model is "good enough," they default to using the most powerful model. A basic classification task may run on the same model originally intended for complex reasoning. When invocation volumes reach millions, the choice between sending simple tasks to a smaller model or routing all tasks to a cutting-edge model often becomes the difference between manageable bills and board-level issues.

Non-software industries will feel this pain in a "transformation" form. Software companies will see this issue first because optimized work is already fully instrumentalized. Engineering teams have metrics like PR, submissions, deployments, incidents, cycle time, and mean time to recovery that are linked to products. While not perfect, this type of work is easier to quantify.

Non-software enterprises will feel this issue more acutely because their work is operational. For example, claims, underwriting, customer service tickets, compliance reviews, supply chain anomalies, and payment disputes. Alternatively, companies with real-world assets will face similar problems. These workflows were usually measured using human labor, cycle time, SLA achievement rates, and error rates, often with higher requirements needing to stand up in audits, rather than just being correct on average. Work units and cost units do not use the same language or exist within the same organization. Technical teams can see token consumption, while business units can observe workflow changes, but connecting the two requires multiple teams to first agree on "what is being measured."

I believe that software companies will experience the token budget wars as a productivity measurement issue, which corresponds to the many "AI layoffs" that have occurred previously; whereas non-software companies will experience it as a transformation issue.

The missing layer is attribution from tokens to results. Enterprises need a conversion layer to link inference spending with completed work and produced business outcomes. This layer must answer three questions: what is the true cost of this workflow, including retries and adjustments? Which parts of the agent's execution trajectory are genuinely important, and which are merely ineffective efforts? Has this work altered the operational model—such as customer service handling fewer tickets, shorter claim cycles, reduced BPO budgets, delayed hiring? The next layer is to do attribution with business language. Instead of simply stating "this workflow cost $2.13," it should articulate: processing these claims by the agent is cheaper than by BPO, but if the policy requires additional exceptional documents, the retry long tail will destroy profitability.

Measurement will become memory. To connect a token with an outcome, enterprises must capture everything that happens in the middle: what the agent saw, what it retrieved, which tools it called, what was ignored, where retries occurred, when human overrides happened, which exceptional rules were applied, which precedents had an effect, and why one path succeeded while another failed. The measurement layer must record decision trajectories, which is precisely what organizations have rarely ever genuinely owned in the past. Recording systems can capture what happened, but rarely can they capture why. For instance, a CRM can tell you that a deal was delayed, but it cannot reveal the undocumented judgments that lie behind sales forecasts.

Decision reasoning is one of the easiest assets to corrupt and lose within a company because it exists in Slack threads, email chains, upgrade meetings, and people's minds. But the problem is that people leave, and processes change.

AI has changed this because agents generate trajectories. Every retrieval, tool invocation, retry, upgrade, manual adjustment, and final decision becomes a part of the path from context to action to outcome. Initially, companies will capture these trajectories to justify the expenditures. However, once captured, these trajectories will become more valuable than cost reports themselves because they will turn into a lasting record of how the organization actually makes decisions (cough, context graph, although I’m really starting to get tired of that term).

The allocation layer is the true prize. If inference becomes a metered resource in customer operations modeling, then every dollar must prove its worth. Which vendors can demonstrate when tokens converted into results and when they did not, and why?

Companies will not fully figure this out on their own. They will treat it as a transformation to purchase. Fortune 500 companies have repeatedly enacted this script: fasten your seatbelt, hire McKinsey, recruit every former Palantir employee in the market, and then have the CEO drive top-down change. Attributing tokens to results will also emerge in a manner similar to ERP, BI, and digital transformation: arriving as a "project" endorsed by executives, underpinned by a foundational infrastructure, and ultimately becoming a new source of truth. Founders who can achieve this will build different types of founding teams, which will also differ from the traditional entrepreneurial prototype.

Whoever masters the attribution from tokens to results will be able to make allocation decisions: which workflows warrant more computational power, which should be limited, which should switch to cheaper models, which should continue to be done by humans, and which can replace BPO. Once you can make these decisions, you control the flow of internal AI spending within the enterprise and gain the trust needed to allocate these resources.

The first phase of corporate AI proved that models can perform tasks. The next phase will determine just how much these tasks are worth paying for. As Charlie Munger said: show me the incentive, and I will show you the results.

Original link

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink