Compare the performance of open-source Large Language Models using multiple benchmarks like IFEval, BBH, MATH, GPQA, MUSR, and MMLU-PRO. Filter results in real-time and see community votes for comp...| huggingface.co