When every frontier AI model can pass your tests, how do you figure out which model is best? You write a harder test. That was the idea behind Humanity’s Last Exam, an effort by Scale AI and …| Economist Writing Every Day
It turns out, ChatGPT mirrors the strengths and shortcomings of human interns more closely than I expected.| Economist Writing Every Day