
日月小楚|Dec 25, 2025 01:53
vibe coding The most frustrating thing is:
Every few days, a new chart-topping model comes out, claiming to be the best in the world with all kinds of data.
You can’t possibly try every single one, so you turn to social media creators for reviews. But then you realize most of them are just hyping things up.
Finally, after much effort, you settle on a model and product that seems suitable, ready to get some real work done. But within days, you find out the model has downgraded.
Seriously speechless. Recently discovered that both Gemini 3 Pro and Claude 4.5 have downgraded.
Yesterday, I used Gemini 3 from Antigravity for a task, and it was problem after problem. Fine, I tested a few commonly used ones, and here are the results:
1. Confirmed Gemini 3 has downgraded.
2. Tested Claude 4.5 in Claude Code, and the downgrade is severe.
3. Tried GLM 4.7 in Claude Code, which recently claimed to be the best in the world. While it found more issues than Claude 4.5, it still missed 1/3 of them.
4. GPT 5.2 in Cursor performed the best, only missing one minor issue while catching all the others.
Based on this, GPT 5.2 is the strongest. The only headache is that GPT 5.2’s code is way too minimalistic—so much of it is hard to understand, and it’s giving me a brain freeze.
Timeline