Josh & Ejaaz: Why we switched from Claude Code to OpenAI Codex

CN
1 hour ago

Written by: Techub News Compilation

In the current intense competition among AI programming assistants, the choice of tools often determines the boundaries of development efficiency and creativity. Josh and Ejaaz, hosts of the Limitless Podcast, recently engaged in an in-depth conversation, analyzing the current status, core differences, and future trends of these two top products based on their real experiences transitioning from Anthropic's Claude Code to OpenAI's Codex (integrated into ChatGPT 5.5). This discussion not only provided an intuitive function comparison and demo demonstration but also touched on the key concepts behind the evolution of AI tools, holding significant reference value for anyone relying on AI for development or creative work.

The Dramatic Flip in Market Landscape: The "Awakening" of Codex and Data Domination

Just a few months ago, Claude Code was the preferred choice for almost all software engineers and enterprises, with installation numbers far ahead. However, Josh pointed out that around Christmas, the "atmosphere" of AI programming underwent a fundamental shift—from a "fun tool" to a powerful weapon used by developers when delivering actual code. Since then, OpenAI seems to have "woken up."

Ejaaz illustrated this shift with a set of astonishing data: In the past week, Codex's installation surpassed 46 million, while Claude Code was below 500,000. This stands in stark contrast to historical data where Claude Code's installations far exceeded Codex. Ejaaz believes the core reason for this dramatic flip is simple: OpenAI released a better model. They have launched more features in the past few weeks than most companies do in an entire year.

To visually compare, they created a "scoreboard": OpenAI Codex leads by 11 points, while Anthropic Claude only scores 2. This 11-point advantage comes from breakthroughs in several key areas.

Five Core Advantages of Codex: From "Superhuman" Operation to Screen Monitoring

1. Computer Control and Speed: Claude was the first to enable AI to take control of the desktop and move the cursor, but it was slow and often encountered obstacles, requiring user guidance. Codex operates not only faster than an average person but even faster than Josh himself. Ejaaz described its cursor moving speed as if "using a computer superhuman," able to run nearly 24/7 without interruption.

2. Long-term Autonomy: Codex can work smarter and for longer durations. Traditionally, AI completing tasks relies on a planning mode named "Ralph Loop" (named after a persistent character from The Simpsons), where AI continues to iterate until reaching its goals. Codex natively integrates this long-term thinking capability, with observations noting that it "thought" for 36 hours to achieve a goal. This is crucial for solving complex tasks.

3. Browser Use and Intent Understanding: Codex has the ability to take over the browser and understand its content more intelligently. Previously, it lacked this capability, but now it can perform more purposeful operations.

4. Integrated Image Generation: OpenAI recently launched ChatGPT Images 2.0 image generation model, which offers "absolutely stunning" quality, outperforming previous leaders in all benchmark tests, including Google Nano Banana 2.0 Pro. Currently, Anthropic does not even have an image generation model. For any user involved in visual work, using this function directly within the software is highly powerful.

5. "Chronicle"—Secret Screen Monitoring and Efficiency Analysis: This is an Alpha feature that Josh believes most are still unaware of. Chronicle will observe your scrolling, clicks, and typing to build context and memory about you without any active input. It offers an extremely powerful prompt: "According to Chronicle (this new memory function), what am I doing on my computer inefficiently? Provide some suggestions and directly tell me what I need to hear." It will evaluate your computer usage habits (like scrolling time on Twitter) and provide real feedback based on observed behavior to optimize your workflow. Currently, this feature is only available to paid members (subscription of $100-200 per month), and Josh sees it as an early sign of important future functionalities.

Furthermore, Codex has recently launched the "Automated Review" feature, which can intelligently differentiate operations that may pose systemic threats from those that do not require approval, automatically approving the latter, significantly simplifying the user interface, allowing users to temporarily leave their computers while tasks continue to function as usual.

Existing Advantages of Claude: Personality, UI, and Mobile Access

Josh and Ejaaz also pointed out areas where Claude currently retains advantages. The first is its "OpenClaw" capability (interestingly, OpenAI acquired OpenClaw). Claude's Dispatch is its mobile application feature that allows users to interact remotely with Claude Code, while Codex has not yet provided this functionality (the team has promised to implement it).

Secondly, in terms of personalization and user interface, Claude excels. When using the LLM itself rather than a tool suite, Claude's experience is superior, with a warmer UI. Additionally, both have released "pet" features (like Angry Dario displayed on the screen), but Codex's pets can exist as persistent characters throughout the computer usage process, chatting with you in the background, showing progress, and being more entertaining, reflecting a focus on user experience.

Practical Demo Comparisons: Building Games from Scratch and Sketch Generation Applications

To validate their theories, they prepared two specific demonstrations.

Demo 1: One-Time Prompt to Build a Mario-Style Game

Ejaaz provided a prompt: asking the AI to create a futuristic Mario-style side-scrolling game with neon elements, including game design, enemies, traps, and a scoreboard. They input this prompt to ChatGPT and Claude, letting each coding model execute at the highest settings.

  • Results from Claude Opus 4.7: The game is named "Neon Plumber Moon Base Run". It has good visuals, sound design, and adheres to game principles. Players can identify dangers (like spikes). However, there are logical flaws, for instance, the promised double-jump feature was not implemented correctly, leading to the inability to collect certain coins.
  • Results from OpenAI Codex (GPT-5.5): The game is also named "Neon Plumber Moon Base Run". The starting screen is more basic but features background animations. The game logic is better, fully playable, with clear heart (health) displays and scoring systems, allowing for functional power-ups. Although it also has some edge errors and lacks music, the overall game experience is smoother and more complete.

In terms of building experience, Ejaaz prefers Codex. Codex, after receiving a single prompt, did not ask for any permission and made decisions autonomously; whereas Claude Code occasionally sought user assistance. For building non-production-grade projects like games, this "hands-off" approach may be more welcomed.

Demo 2: Generating Applications from Hand-Drawn Sketches

They provided a hand-drawn (actually generated by GPT Image Gen 2.0) sketch of a "Universal Limitless Dashboard App" and input it into the model.

  • Claude's result: Generated a dashboard, but the style is basic and predictable. The page contains a lot of text and graphical elements, inferring functionalities like travel budgets (even though the prompt did not explicitly ask for it). However, it created a travel planner rather than a dashboard surrounding the Limitless Podcast, which may be related to prompt comprehension.
  • OpenAI Codex's result: The interface is cleaner and neater, without pursuing a futuristic or neon style. It provided basic information for a five-day travel plan with multiple tabs and better visuals. Although not exactly what the sketch specified, the design is easier to understand, less dense, and seems connected to data (with a "Re-optimize" switch at the top). Josh believes Codex "totally crushes" in this aspect, accurately reproducing the design from the original paper.

Both demonstrations show that Codex outperforms in logic implementation and functional completeness.

Beyond the Model Itself: The "AI Model Suite" and "Vanilla Maxing" Philosophy

Ejaaz pointed out that a key factor for the rapid progress of both companies' models is the "AI Model Suite". This refers to layers added on top of the base models, including pre-set prompts, the environment for model operation, and strategies to ensure models behave and sound in specific ways. This also explains why Claude's personality is superior to ChatGPT’s.

Recently, Cursor opened its suite, Cursor SDK, via API, which is significant. Critics had deemed Cursor just an AI wrapper, but it turns out this "wrapper" or "suite" can make models smarter. If Cursor's suite is applied to GPT-5.5 and Claude Opus 4.7, the resulting models will be smarter and more efficient than the original base models. This means that while AI labs invest heavily in training models, startups can create outstanding products by developing better "suites." The suite and the model itself are now inseparable, forming a valuable moat.

Josh further elaborated on the role of the "suite" in building "super applications." Every company is trying to create a comprehensive operating system-level application with AI as its foundation. OpenClaw excelled early on in this regard. This week, Sam Altman announced that users can now connect their ChatGPT accounts to generate tokens on OpenClaw, which may be the beginning of a multi-step plan to deeply integrate OpenClaw into Codex. OpenAI owns OpenClaw, and while promising to keep it open-source, it has the capability to integrate directly into its own products. Codex developers have also confirmed that native editors, iOS applications, full browsers, and OpenClaw functionalities will be on the way.

However, Ejaaz noted that the excitement around OpenClaw has faded as, despite these tools being at the forefront, they are difficult to scale to practical applications. Users are hesitant to integrate them into desktops containing personal files, and there have been terrifying stories like accessing credit card data or deleting old wedding photos. In contrast, tools provided under brand reputation (such as ChatGPT Codex, Claude Cowork, or NVIDIA's enterprise-level secure version NemoClaw) are more comforting to use.

This leads to the "Vanilla Maxing" philosophy praised by Josh: you should 100% use factory tools. Many people fall into the trap of using various different repositories, skills, and plugins, but the reality is, AI labs iterate fast enough that they will integrate features directly into native applications. Therefore, the best strategy is "Vanilla Maxing"—using the tools provided officially, without rushing to try cutting-edge but potentially unsafe external tools.

Future Prospects and Personal Usage Stack

Ejaaz concluded that there is currently no clear winner, but he leans towards Codex GPT-5.5. However, the narrative has shifted so quickly that Claude may still catch up. A model that has not been discussed or demonstrated is Claude Mythos, which was pseudo-released a few weeks ago and technically surpassed 5.5 in all benchmark tests, but Anthropic did not open access due to concerns about it being "too dangerous" and "cybersecurity risks" (a concern also mentioned by Peter Heskett of the U.S. Department of War). OpenAI has created a model at Mythos level and made it available for everyone. This may also stem from Anthropic's limited computational resources.

Regarding personal usage stacks, Josh mentioned that he has fully transitioned to Codex for all challenging tasks. However, he thinks that as an LLM or chatbot, GPT-5.5 slightly lags behind Opus 4.7, which has a warmer and more accurate personality and can better understand his intentions. Therefore, when building complex projects, he uses Opus 4.7 as the "coordinator" and Codex as the "executor." He also found that Opus 4.7 does not perform as well as 4.6 in certain areas, such as writing or text digestion tasks, where he still uses Opus 4.6.

Ejaaz's stack is more diversified. For research, he has begun to turn to GPT-5.5 because it allows for longer and deeper discussions. He gave examples of testing prompts regarding the AI power stack and investment targets, where 5.5 completely surpassed 4.7. However, he still uses 4.7 for personal reasons. Overall, he believes OpenAI is undergoing "a generational sprint" and may soon fix existing issues.

Finally, Josh encouraged users to personally try out both tools and test them with practical prompts. No matter what kind of work you do, as long as you use a computer, AI may help you complete tasks more efficiently or assist you in pursuing hobbies and side projects you always wanted to do. The biggest winner in this competition is the user, as all this cutting-edge intelligence and capability is available for just $20 a month.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink