Original Title: The largest cryptocurrency exchange in the US quietly switched to Chinese AI models, saving half the money
Original Author: AI Hands-On Notes
A piece of data that has Silicon Valley on edge
Recently, CEO Brian Armstrong of the largest cryptocurrency exchange in the US, Coinbase, made a statement that caused quite a stir in the tech community:
“We switched our AI models to China’s GLM 5.2 and Kimi 2.7, cutting AI expenses directly in half.”
Cut in half? Has the usage also decreased?
On the contrary. The token usage at Coinbase has been continuously rising.
Saving money while using more, that’s what truly unsettles OpenAI and Anthropic.
How did they do it? Three money-saving strategies
Coinbase didn’t simply switch to a cheaper model and call it a day. They built a complete “money-saving system”:
First Strategy: Do not bind to one model, let the system choose
Coinbase set up an automated routing system. Every time a request comes in, the system automatically selects the most suitable model based on the type of task, price, and cache conditions.
Not all tasks require the most expensive model. Simple translations use cheaper models, while complex inferences use better ones — just like you wouldn't drive a sports car to buy groceries.
Second Strategy: Increase cache hit rate from 5% to 60%
This is the strongest move. By optimizing caching strategy, Coinbase raised the cache hit rate from 5% to 60%.
In simple terms, 60% of requests can reuse previous computation results, significantly reducing the actual cost of each call. Just this optimization alone saved a substantial amount of money.
Third Strategy: Context Engineering
Coinbase requires developers to streamline context, start new sessions for new tasks, and not overload a single conversation with too much information.
This is not laziness, but a new discipline — the industry calls it Context Engineering. Anthropic explicitly pointed out in a technical blog: when managing AI agents, context engineering is more effective than prompt engineering.
In simple terms: it’s not about making AI smarter, but providing AI with more precise information.

▲ More and more companies begin to carefully manage AI model costs
It’s not just Coinbase, this is a trend
Coinbase is not the first to take this leap.
Lindy, a 25-person AI startup, saw CEO Flo Crivello directly switch from Claude to Deepseek. He told CNBC, “AI costs have already surpassed human labor costs, which is unsustainable.” After switching models, costs “plummeted,” saving millions of dollars.
Snowflake CEO Sridhar Ramaswamy conducted a practical comparison: on 103 coding tasks, GLM-5.2 solved 66%, while Claude Opus 4.7 solved 67%. The gap? Almost none.
But the price difference is very real:
Price Comparison (per million tokens)
- GLM-5.2: Input $1.40 / Output $4.40
- Claude Opus 4.7: Input $5 / Output $25
- GPT-5.5: Input $5 / Output $30
The output price difference is 5-7 times.
Cheap doesn’t mean bad? Don't jump to conclusions
At this point, you might ask: with such low prices, can the quality be the same?
To be honest, it's not exactly the same, but the difference is smaller than you think.
Snowflake’s testing shows that GLM-5.2 is indeed unstable in some tasks — the first-try success rate is 47.6%, lower than Opus's 53.7%. Moreover, GLM sometimes “gets stuck” in the wrong direction: on one task, it took 24 minutes and called 411 times, still failing. Opus completed it in 9 minutes with 49 calls.
However, on most tasks, the final success rates of both are nearly the same. The key is: are you willing to pay 5 times the price for a few percentage points of stability?
For many businesses, the answer is becoming clearer: no.

▲ The price gap between Western and Eastern AI models is reshaping the industry
What does this mean for the average person?
You might say: I am not Coinbase, how is this related to me?
Actually, this trend has three direct implications for how you use AI:
1. Don’t rely on just one model
Many people use AI and stick to one model — either chatGPT or Claude. But professional users don’t do that anymore. Different tasks require different models to achieve the best cost-effectiveness.
Use cheaper models for everyday Q&A and higher-quality models for coding and analysis. Just like when you eat, you don’t go to a Michelin restaurant every meal.
2. Caching and reuse are the keys to saving money
If you frequently use AI for similar tasks (like writing weekly reports or organizing notes daily), learn to utilize caching and templates to greatly reduce consumption.
3. Streamlining context = Better results
Many people, when conversing with AI, feel the need to include all background information. But it has been proven that providing AI with less but more precise information yields better results. For new tasks, start new conversations. Don’t make AI search for answers in a pile of history.
Deeper changes: AI pricing models are being redefined
This wave of "model migration" reflects a shake-up in the entire AI industry's pricing logic.
The high valuations of OpenAI and Anthropic are based on the assumption of “sustained rapid revenue growth.” But if more and more companies, like Coinbase and Lindy, shift to cheaper alternatives, this assumption collapses.
Reports indicate that OpenAI and Anthropic have already begun a price war. In the recently released GPT-5.6 series, the Terra model is half the price of GPT-5.5, and Luna focuses on the lowest price.
For users, this is a good thing. The more fierce the competition, the lower the prices and the wider the selection.
When American giants begin to save money using Chinese models, it indicates that the competition in AI is no longer a scoring competition in labs, but a real cost battle. Being able to do the same thing for less money is the true capability.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。