In recent years, with the rapid development of the depth and breadth of large language models' capabilities, various corresponding evaluation benchmarks have been emerging in increasing numbers. As a quantitative assessment tool for model performance, benchmarks are not only a core means to measure model capabilities but also a key element in guiding the direction of model development and promoting technological innovation. We systematically review the current status and development of large ...| arXiv.org
Current AI results from experimental variation of mechanisms, unguided by theoretical principles. That has produced systems that can do amazing things. On the other hand, they are extremely error-prone and therefore unsafe. Backpropaganda, a collection of misleading ways of talking about “neural networks,” justifies continuing in this misguided direction.| Better without AI