Compare the performance of open-source Large Language Models using multiple benchmarks like IFEval, BBH, MATH, GPQA, MUSR, and MMLU-PRO. Filter results in real-time and vote on your favorite models.| huggingface.co