Measuring AI Ability to Complete Long Tasks - METR
Analysis code available on GitHub
| metr.org
Roast topics
Find topics
Find it!