Mnemonic Founder: In the era of AI programming, the verification layer is more important than the code itself.

Written by: Techub News Compilation

In the latest interview with Y Combinator, Weiwei and Jeff, the two co-founders of the AI software testing platform Mnemonic, shared their thoughts after completing a $50 million Series A funding round. In an era where AI programming tools (such as Cursor and Claude Code) have significantly increased code output speed, they pointed out that the bottleneck of software development has shifted from “writing” to “validation.” This conversation delved into the fundamental shift in software engineering paradigms in the AI era and the crucial role Mnemonic plays as a “validation layer.”

From "Code is Truth" to "Truth-Driven Development"

When asked why engineers have historically disliked writing tests, Jeff offered a straightforward answer based on his experience at Robinhood: testing work lacks “visibility.” It is neither a feature that customers can directly perceive nor easily showcased in shiny presentations, and it rarely directly reflects in performance assessments. Thus, testing is often seen as a burden and a secondary task. However, as AI coding tools lead to exponential growth in code production, validating the correctness of this code has become a more daunting challenge than writing the code itself.

Weiwei and Jeff observed that traditional code review and static analysis tools, although capable of checking code style and patterns, fail to answer a fundamental question: Once this code is deployed to production, does it work as expected? Currently, many teams still rely on manual “Bug Bash” before release—logging in, clicking, and operating manually, which becomes entirely unscalable as the product and team scale up.

What Mnemonic does is precisely fill this vital gap: functional testing. Their platform simulates real user behavior, automatically running in the browser, traversing various user flows of the application, ensuring everything functions correctly from the end-user’s perspective. Whenever engineers submit code changes, Mnemonic automatically verifies whether these changes break any established user flows.

This leads to a more forward-looking perspective: Truth-Driven Development. Jeff explained two modes of thinking: one is “code is truth,” meaning the code in production is the ultimate definition of product behavior; the other is “truth (or specification) driven development.” In the latter model, the user journeys, success criteria, and edge cases detailed by product managers or engineers (often collaborating with AI) form the “single source of truth” on how the product should work. Code, whether written by humans or AI, is merely one implementation of this “truth” and may contain errors.

“Since engineers make mistakes, and AI makes mistakes,” Jeff said, “it’s unreasonable to let the codebase itself be the source of truth for how the product should work.” Their core assertion is: in the era of AI programming, code is gradually becoming ‘commoditized’. In the future, the core work of engineers will no longer be writing or reviewing TypeScript or React code, but rather writing detailed product specifications in natural language, which will then be implemented by AI agents. Engineers will transform into “requirements gatherers” and “truth discoverers,” focusing on deciding “what should be built,” while code is merely the “implementation detail” of achieving that goal, capable of being replaced by better models at any time.

AI Programming Agents Need an Independent "Validation Layer"

With the increasing prevalence of AI programming assistants like Claude Code and Cursor, a natural question arises: why not let these agents write tests themselves? The founders of Mnemonic provided explanations from several perspectives.

First, reliability issues. AI agents often confidently believe that the code they generate is correct, but that may not be the case. Users cannot fully trust the agents' own judgment; an independent third party is needed to verify whether their outputs meet specifications. It’s like traditional development, where developers don’t deploy code simply based on their say-so that there’s no issue, but require unit tests, integration tests, and other external validations.

Second, testing capabilities for complex interactions. Many modern web applications have extremely complex interaction interfaces, such as rich text editors and drag-and-drop canvases. General-purpose AI browser agents have not been optimized for testing such complex scenarios, while Mnemonic specifically trains its agents to handle these challenges.

Furthermore, speed and debuggability. Testing with general-purpose AI browser agents is very slow, and when tests fail, it’s challenging to diagnose the issue—what element wasn’t interacted with correctly? What was the state of the page? Mnemonic optimizes average operation step times to within 300 milliseconds and builds the entire platform around debuggability, allowing its agents to automatically diagnose issues.

Finally, and most importantly: the continuous maintenance of “truth”. Even if today Cursor generates 100,000 lines of Playwright test code, tomorrow when major changes occur in product features, who will update those 100,000 lines of code? Mnemonic’s solution is to encapsulate the entire testing system, building a mechanism that automatically maintains this “source of truth” over time. Their system can even proactively suggest updates to tests—for instance, when a new component is discovered in the UI, it will ask whether this is the expected change and can automatically update the relevant tests without requiring the user to expend a lot of tokens or sessions for manual adjustments.

“Essentially, we’ve ‘closed the feedback loop’ for AI programming agents,” Weiwei summarized. The specifications define what to build and how to validate, while Mnemonic ensures the outputs of AI agents meet these specifications.

Customer Practices and Culture Building

Mnemonic's client list includes well-known companies like Notion, Built, and Quora, processing over one million test runs daily. The collaboration with Notion began with an interesting opportunity: Notion engineer Simon Last posted on Twitter wishing to describe a feature simply and have it automatically tested. Many users recommended Mnemonic in response. Weiwei, then in San Francisco, directly messaged Simon at 10 PM that night and recorded a demonstration video testing in his Notion workspace, completing the preliminary integration that evening.

Notion’s previous testing solution mixed manual testing with a large Selenium automation testing suite. Selenium is known for its fragility (such as reliance on changeable XPath or selectors) and high maintenance costs, especially for complex products like Notion with flexible rich text editors and everything as a database, which poses significant challenges. Mnemonic can handle these complex scenarios with simple natural language commands. Today, Notion executes nearly 500,000 Mnemonic tests daily, and engineers’ code merge requests must pass Mnemonic tests to be approved.

When measuring value, Mnemonic considers the most direct ROI to be the time saved for engineers (especially compared to traditional tools like Selenium and Cypress). However, their true “north star” metric is: how many potential regression bugs or severe incidents affecting end users have been prevented. The ultimate goal of testing is to ensure product quality and reliability.

As a rapidly growing startup (with a team of only 13), Mnemonic places great importance on shaping its early culture. Jeff characterizes their culture as “being honest”—direct and clear feedback while respecting colleagues. They want everyone to have a voice, and for all to participate in discussions about the product roadmap. In talent recruitment, despite being in the AI boom, they still believe that the essence of excellent engineers remains unchanged: adaptability, the ability to navigate ambiguity, curiosity, and enthusiasm. AI tools will only enhance already exceptional engineers.

The backgrounds of the two founders are also quite interesting: Weiwei originally planned to become a pharmacist after graduating from high school but found it dull after attending a pharmacy summer camp, so he switched to computer science in college. Jeff initially intended to study chemistry at Cambridge but realized that the repetitive laboratory work, while solving world-class problems, lacked the challenge of collaboration and product creation, leading him to shift to technology entrepreneurship. They met through a mutual friend at the end of 2023 and after a week of living together on Jeff's couch and deep discussions, decided to jointly found Mnemonic.

Future Outlook and Founders’ Mindset

For their future roadmap, Mnemonic's focus is on narrowing down. They have witnessed rapid evolution in the workflows of engineers and thus this year's emphasis will be on optimizing the developer experience and productivity, aiming to reduce the barrier to zero or even negative, allowing engineers to “fall into the trap of success.” Specific technical expansions include testing support for Android, iOS, and desktop applications.

Reflecting on their entrepreneurial journey, early hiring was one of the biggest challenges. In an environment where AI startups are emerging like mushrooms after rain, and large foundational model companies are highly attractive, convincing top talent to join requires double the effort. They optimized the interview process, even introducing a unique “one-day work trial” segment, and highly valued cultural building after team onboarding, such as in-depth reviews, discussions, and team retreats.

For founders with an engineering background, learning sales is a crucial skill. Jeff’s realization is: “You must do it yourself.” He believes that selling skills cannot be learned simply by observing others; everyone has their own communication and sales style, and one must learn and evolve through extensive practice (gaining “experience points”).

What drives them to continuously tackle the challenge of “code validation”? Weiwei approaches it from a utility standpoint, believing that the speed of product development and feature iteration is largely constrained by the speed of code validation. Solving this fundamental issue could lead to a global productivity boost. Jeff's motivation is a mix of ambition and market insight: software testing (QA) is a huge market, but Mnemonic's vision is even grander—the goal is to become the cornerstone of all current and future software validation. “We not only want to win,” he candidly added, “but to eliminate all competitors. It’s bound to happen.”

At the crossroads of AI reshaping the software development process, Mnemonic's positioning is clear and steadfast: as the generation of code becomes cheaper and more automated, the value of the “validation layer” that ensures code accurately reflects human intent will become increasingly prominent. This is not merely a change of tools, but a profound transformation regarding the core values of software engineering.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Mnemonic Founder: In the era of AI programming, the verification layer is more important than the code itself.

From "Code is Truth" to "Truth-Driven Development"

AI Programming Agents Need an Independent "Validation Layer"

Customer Practices and Culture Building

Future Outlook and Founders’ Mindset

Selected Articles by Techub News

Table of Contents

Related Articles