Anthropic's latest model Opus4.7 with 8 Hidden Blades

Written by: Silicon Valley Alan Walker

The eight knives that were not clearly explained at the press conference, and the tracks and industries they aim to cut down

The press conference focused on SWE-bench, but the real signals are hidden in the footnotes, the introduction block, and a seemingly inconspicuous auto mode. After finishing this cup of coffee, the old OG will break it down for you.

ZOMBIE CAFÉ · APR 16, 2026 · PALO ALTO

On California Ave in Palo Alto, at nine thirty in the morning, light slanted in through the glass windows of Coupa Café, shining on Alan Walker's half-cold flat white. He had just scrolled through Anthropic's official website, leaned back against his chair, and spoke to Tony, who had just sat down across from him.

"This time, Anthropic released Opus 4.7, and the press conference was quite restrained—the main characters were those few pillars of SWE-bench, customer quotes in rotation, and a nice alignment diagram. Most tech media just copied the press release and left."

"But the real content of this thing is all buried in the footnotes, migration guide, and a casually mentioned 'auto mode expanded to Max users.' You have to read it like it's a 10-K— the main text is for retail investors, while the notes are for institutions."

"Before I finish this cup of coffee, I will dissect eight knives. I'll tell you who each one cuts towards."

—— BLADE NO. 01

xhigh is not an upgraded gear —— Default has been secretly raised

The press conference briefly mentioned: "In Claude Code, we've raised the default effort level to xhigh for all plans."

Most people see xhigh and think it’s "just another gear," like an additional color of the iPhone. Wrong. The real signal is in the last half sentence— the default level for all plans in Claude Code has been raised to xhigh.

This is a very Anthropic move: quietly raising everyone's baseline a notch, while the computing bill remains unchanged. It’s equivalent to giving you a smarter colleague, but not raising your salary.

TONY: Wait, doesn’t this mean that Pro users who used to pay $20 for medium now directly get xhigh?

ALAN: Yes. And look closely at that quote from Hex— "low-effort 4.7 ≈ medium-effort 4.6." With the default level raised, it means that ordinary users are getting effective intelligence that jumped two full notches. The press conference didn't highlight this figure because they didn’t want the token consumption page to look bad.

Practical Scenarios

On Monday morning, you ask Claude Code to modify a 500-line backend module— originally you had to manually type /effort max to let it run itself; now you don’t set anything, the default is xhigh, and you return from a cup of coffee to find the job done. This difference isn't 10% faster, it's "you don't have to worry about it anymore."

KILL LIST

→ "AI tuning / prompt configuration" type SaaS— those that teach you how to adjust thinking budgets and select effort, with default values automatically set right, the middleware has no business

→ Junior engineer positions— xhigh default tasks are already the quality baseline of a three-year experienced engineer

→ Outsourced code review companies— the next third knife will take this down

—— BLADE NO. 02

Auto Mode —— The silent revolution of Permission UI

The third line of footnotes from the press conference: "Auto mode expanded to Max users." Just one sentence.

Anthropic's official wording: "auto mode is a new permissions option where Claude makes decisions on your behalf."— "making decisions for you."

In the past year, all agent startup companies have been divided into two extremes: either skip-all-permissions all the way (the path of Devin, Cognition), or madly pop-ups for approve/deny (early Cursor). Anthropic took a third path: training the model to judge what should be asked and what should not be asked, and internalizing this judgment ability into auto mode.

KAI: Alan, what's the fundamental difference between this and skip permissions? Aren't we just letting it run?

ALAN: The difference is huge. Skip means you’ve pulled the insurance, and you’re responsible if something goes wrong. Auto means the model has installed its own set of insurance— it actively stops to ask you for dangerous operations, and handles low-risk tasks itself. Essentially, "permission UI" as a whole has been moved from the product shell to the model weights.

TONY: So all those YC startups doing "agent governance / guardrails"...

ALAN: The product has effectively been built into the model. This embodies what Andrej said last year, "the model is the product." A living example.

KILL LIST

→ Agent guardrails / approval-flow SaaS— those making "human-machine collaborative approval platforms" have had their entire category dimensionally reduced

→ Traditional RPA industries (UiPath / Automation Anywhere)— their core value is "controllable automation," and now controllability is intrinsic

→ Middle and back office in the BPO outsourcing industry— those data entry, customer allocation, invoice reconciliation in the Philippines and India, auto mode can run for a day, a team’s workload

—— BLADE NO. 03

/ultrareview— a kill order for Senior Engineers

The website uses the term: "a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch."

Note that word— "a careful reviewer." Not junior, not linter, but "careful reviewer." In plain terms: Senior Engineer.

David Loker from CodeRabbit provided direct numbers: recall increased by over 10%, digging out the hardest-to-catch bugs in the most complicated PRs, with precision almost unchanged. An increase in recall and no drop in precision— this is the holy grail in the code review field, the last one to achieve this combination was Google's internal Tricorder, which took ten years.

MARCUS: We’re talking about $800K a year for a staff engineer at FAANG, with reviewing PRs taking up half of their time. If this really works...

ALAN: Pro and Max users get three free ultrareviews to try it out. This is Silicon Valley's typical "freemium bait" strategy— letting you taste the flavor before making you unable to go back.

MARCUS: So this isn't just a tool, it's a substitute.

ALAN: Not entirely. It doesn’t replace staff; it replaces those two hours every afternoon when staff review ten PRs. The freed two hours allow seniors to actually be senior, not just human GitHub bots.

Practical Scenarios

In a twenty-member engineering team, the tech lead previously spent three hours every day reviewing PRs. With /ultrareview, the tech lead only needs to look at the few "design issues" flagged in Claude— three hours become twenty minutes, and the saved time can be truly spent on architecture. This is not just "AI assistance," it's a rewriting of job responsibilities.

KILL LIST

→ All standalone AI code review startups— CodeRabbit, Codacy, Qodo, they are now features of Anthropic

→ Traditional SAST / DAST security scanning tools (Snyk / Checkmarx)— rule-driven static scans get crushed by "reading code like a human"

→ Outsourced code review services in India / Eastern Europe— this market, valued at several billion dollars over the past decade, is now evaporating

—— BLADE NO. 04

2,576 pixel visuals —— Computer-Use transitioning from Demo to Weapon

"Acceptable image longest side up to 2,576 pixels, about 3.75 megapixels, more than three times before."

This point is the most underestimated. Most people see it and think "oh, it’s more high definition." This is absurdly wrong. This marks the whole category of computer use moving from demo to production.

The evidence lies at the bottom of the release page in that quote block, where XBOW's CEO Oege de Moor said one sentence—

54.5% → 98.5%. This is not a gradual improvement; this is a leap from "not usable" to "cannot not use." Opus 4.6 was still guessing where the buttons were on the screen; 4.7 can read the small text and nested tables on dense dashboards.

SARAH: Our enterprise clients have always been stuck at this point. 4.6 made it automatically process invoice scans, but half of them were wrong— the boss directly said, "stop playing around."

ALAN: Now, the 98.5% figure means RPA, IT operations, expense audits, legacy system migrations— all workflows that still rely on human eyes to look at screens finally have an acceptable baseline model for the first time.

KAI: computer use is no longer a demo video, it's productivity.

ALAN: Yes, and note— this is an upgrade at the model level, not an API parameter. Old users don't change anything, they automatically benefit. Anthropic is quietly pushing the product capability of all integrators up a notch.

KILL LIST

→ OCR / document understanding SaaS (Rossum / Hyperscience / Nanonets)— their moat used to be "vision + structuring," and now it’s matched or even surpassed by general models

→ The traditional RPA triad— UiPath's screen recognition core technology has had its value halved overnight

→ Data entry departments in enterprise applications— medical insurance claims, bank KYC, government form processing, the entire manual assembly line

→ Independent penetration testing / red team industry— companies like XBOW may benefit, but traditional pentesting consultant services are breached

—— BLADE NO. 05

File-System Memory— Anthropic chose the simplest path

A footnote from the press conference: "Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work."

OpenAI took the "embedded memory" route— gluing memories into the model where you can’t see or change them. Google is doing mysterious infini-attention. Anthropic revealed this time: the file system is memory. Claude writes .md notes, reads .md notes, and you can cat them out at any time.

This choice seems low-tech, but it’s a victory of first principles. The core issue of memory has never been storage, but auditability, editability, and transferability. Vector databases and embedded memory violate these three points.

ERIC: What enterprise clients fear most is "what does this AI remember about me, and I don't know."

ALAN: File system memory directly addresses compliance. GDPR right to deletion? rm it. SOC2 audit? cat for the auditors to see. This isn’t a technical advantage; it's a legal advantage.

ERIC: So those startups doing "AI memory layers"...

ALAN: Mem0, LangMem, Zep— they raised a lot of money this year. They solve the problem of "the model won't manage memory by itself." Anthropic wrote this ability into the model, using the simplest POSIX file system. The middle layer is bypassed.

KILL LIST

→ AI memory infrastructure startups (Mem0 / LangMem / Zep)— value propositions internalized into the model

→ Some vector database's agentic memory use cases— major narratives of Pinecone and Weaviate are impacted

→ AI enhancement layers in enterprise knowledge management SaaS— no need for third-party middleware anymore, Claude can directly read and write project files

—— BLADE NO. 06

Task Budgets— equipping the Agent with brakes and then letting off the gas

"Giving developers a way to guide Claude's token spend so it can prioritize work across longer runs." (public beta)

This was overlooked by all media, but it is the most significant engineering breakthrough for long-range agents this year.

In the past year, all agent companies have dealt with the same demon: uncontrolled token burn for long tasks. Give Devin or Cursor a complex task; it runs for two hours, comes back telling you it burned $800, and only half the job is done. The boss sees the bill and turns green.

The design of Task Budget is very clever—not merely a token limit, but allowing the model to see the budget counting down, deciding for itself which steps to skip and how to achieve the task to the most critical degree.

CLAIRE: Isn't this the "minimal deliverable" mindset from project management?

ALAN: Yes. Anthropic has trained this PM skill of scope-cutting into the model. Give it a $10 budget to run the agent; it will decide itself which features to achieve 80% and which must reach 100%.

TONY: So that quote from Notion— "implicit-need tests" is the first to be passed—

ALAN: You got it. The model is developing "resource awareness," able to guess what you didn't say but expect, and prioritize it within the budget. This trains in "senior engineer judgment."

KILL LIST

→ AI cost-control / LLM observable startups (Helicone / Langfuse cost module)— core functions are being native

→ Agent orchestration frameworks (some uses of LangGraph / CrewAI)— the model can plan the budget itself without needing outer scheduling

→ The project management parts of traditional consulting— "resource allocation + delivery trimming" intelligence is now handled by the model

—— BLADE NO. 07

Proof before coding— new behavior discovered by Vercel

Joe Haddad, Distinguished Eng at Vercel: "It even does proofs on systems code before starting work, which is new behavior we haven't seen from earlier Claude models."

This sentence got buried among more than twenty quotes, and no one magnified it. But the old OG put down his coffee immediately upon reading it. ☕️

"proofs on systems code"— before writing system-level code, the model will first do mathematical/formal proofs on its own. This doesn't just mean it's smarter; this means the model has begun to use methods similar to PhD validation of papers to verify its own code.

MARCUS: This behavior appearing in the training data indicates that Anthropic explicitly rewarded "prove before write code" at the RL stage.

ALAN: Yes, this was trained intentionally. Combining that segment from Vercel with Genspark's "loop resistance" and Hex's "correctly reports when data is missing instead of plausible-but-incorrect fallbacks"— what you see is a complete training program: getting the model to start working like an engineer that is hard to fool.

MARCUS: Hard to fool— means it doesn't deceive itself.

ALAN: Exactly. Opus 4.7 no longer fabricates a seemingly runnable plan just to complete a task. This is a concrete manifestation of alignment hitting the product level.

KILL LIST

→ The formal verification tool niche (partially)— some introductory use cases for high-threshold tools like Coq/Lean/TLA+, the model takes care of them by itself

→ High-frequency trading / blockchain security auditing industry— the core work of auditors ("reading code to find invariant violations") gets collaborative-ized by the model, and auditing prices are driven down

→ Operating system kernel / embedded outsourcing— those niches requiring proof-based reasoning have their thresholds leveled

—— BLADE NO. 08

Cyber Verification— the window for regulatory arbitrage is opened

"During its training we experimented with efforts to differentially reduce these capabilities."

The most audacious operation lies here. Anthropic admits to have actively reduced Opus 4.7's cybersecurity capabilities during training, because the stronger Mythos Preview behind it wasn't released. Then —

They launched a Cyber Verification Program, allowing legitimate security researchers, pentesters, and red teams to unlock higher permissions after certification.

ERIC: This… isn't this like a model version of export controls?

ALAN: More accurately, it's "capability KYC." The model has three layers of capability gates, and you must verify your identity to unlock the corresponding level. For the first time, the window for regulatory arbitrage is clearly marked by AI companies themselves.

ERIC: What does this mean for startups?

ALAN: First, for general "AI + security" startups, wanting to engage in high-end scenarios requires first obtaining Anthropic's certification, as the supply chain itself is being regulated. Second, a new category will emerge: consulting services that help you obtain Anthropic certification— similar to companies that help you with SOC2 today. Third, this is Anthropic's way of practicing the release of all frontier models in the future; releasing Mythos will only be stricter.

TONY: So companies like Palantir and Booz Allen, which are well-versed in government compliance...

ALAN: They gain an extra layer of moat for free. They already have liquidation-level identities, and now automatically unlock the top-tier model.

Practical Scenarios

A YC entrepreneur looking to do AI pentesting will find that starting from Q2 2026, the first page of their business plan must answer "do you have Anthropic Cyber Verification?" No? No VC investment. Yes? Valuation doubles. One certification becomes a watershed moment in financial markets.

KILL LIST & New Tracks

→ General network security startup SaaS— those without Anthropic certification cannot access upper-level model capabilities and are hit with a locked ceiling

→ A new category of "AI model capability compliance consulting" emerges— a batch of intermediaries that help enterprises get frontier model certification will appear in the next 12 months

→ Traditional military-industrial and government-integrated contractors (Palantir / Booz Allen)— naturally benefit, with thresholds becoming moats

→ Open source / local deployment factions— routes like Llama, Qwen, DeepSeek may benefit, with "usable without certification" becoming a core selling point

Alan Walker pushed his empty cup to the edge of the table and closed his MacBook.

Outside, the sun on California Ave had already climbed over the roof of Palo Alto Creamery, casting slanted light on the glass.

"Eight knives, cutting in eight directions. Some tracks will begin to die today, while others will begin to live."

"With each generation of frontier model releases, the real things are never written in the headlines." He told Tony, "Press conferences are for analysts. The footnotes and numbers in quotes are what we look at."

"Don't just watch the excitement."

— Alan

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Anthropic's latest model Opus4.7 with 8 Hidden Blades

Selected Articles by Techub News

Table of Contents

Related Articles