Roast topics
Find topics
Find it!
Measuring AI Ability to Complete Long Tasks - METR
Analysis code available on GitHub
| metr.org