In recent years, with the rapid development of the depth and breadth of large language models' capabilities, various corresponding evaluation benchmarks have been emerging in increasing numbers. As a quantitative assessment tool for model performance, benchmarks are not only a core means to measure model capabilities but also a key element in guiding the direction of model development and promoting technological innovation. We systematically review the current status and development of large ...| arXiv.org
Learn more about the only AI benchmark that measures AGI progress.| ARC Prize
Current AI results from experimental variation of mechanisms, unguided by theoretical principles. That has produced systems that can do amazing things. On the other hand, they are extremely error-prone and therefore unsafe. Backpropaganda, a collection of misleading ways of talking about “neural networks,” justifies continuing in this misguided direction.| Better without AI