A paper by Apollo Research found that in certain contrived scenarios, AI systems can engage in deceptive behavior.| TIME
Today we are publishing a significant update to our Responsible Scaling Policy (RSP), the risk governance framework we use to mitigate potential catastrophic risks from frontier AI systems.| www.anthropic.com
FrontierMath is a benchmark of hundreds of unpublished and extremely challenging math problems to help us to understand the limits of artificial intelligence.| Epoch AI
Novel AI system mastered the ancient game of Go, defeated a Go world champion, and inspired a new era of AI.| Google DeepMind
We’re releasing RE-Bench, a new benchmark for measuring the performance of humans and frontier model agents on ML research engineering tasks. We also share data from 71 human expert attempts and results for Anthropic’s Claude 3.5 Sonnet and OpenAI’s o1-preview, including full transcripts of all runs.| metr.org
OpenAI o3 scores 75.7% on ARC-AGI public leaderboard.| ARC Prize
Our approach to analyzing and mitigating future risks posed by advanced AI models| Google DeepMind