I ran Claude Sonnet 4, Kimi K2, and Gemini 2.5 Pro on the same Next.js app and measured cost, speed, and whether the code actually shipped without follow-ups.| forgecode.dev