Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. The dashboard tracks AI progress over time, and correlates benchmark scores with key factors like compute or model accessibility.| Epoch AI
At I/O 2025, we shared updates to our Gemini 2.5 model series and Deep Think, an experimental enhanced reasoning mode for 2.5 Pro.| Google
MathArena: Evaluating LLMs on Uncontaminated Math Competitions| matharena.ai
FrontierMath is a benchmark of hundreds of unpublished and extremely challenging math problems to help us to understand the limits of artificial intelligence.| Epoch AI