The Holistic Evaluation of Language Models (HELM) serves as a living benchmark for transparency in language models. Providing broad coverage and recognizing incompleteness, multi-metric measurements, and standardization. All data and analysis are freely accessible on the website for exploration and study.| crfm.stanford.edu
An expert-led domain-specific approach to measuring AI safety| Proof
Today, we’re introducing the availability of Llama 2, the next generation of our open source large language model.| Meta