I ran Claude Sonnet 4, Kimi K2, and Gemini 2.5 Pro on the same Next.js app and measured cost, speed, and whether the code actually shipped without follow-ups.| forgecode.dev
I tested Kimi K2 and Qwen-3 Coder on 13 Rust development tasks across a 38k-line codebase and 2 Frontend refactor tasks. The results reveal differences in code quality, instruction following, and development capabilities.| forgecode.dev