Login
Roast topics
Find topics
Find it!
From:
www.latent.space
(Uncensored)
subscribe
Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge
https://www.latent.space/p/benchmarks-201
links
backlinks
Roast topics
Find topics
Roast it!
Clémentine Fourier of HuggingFace on why you should stop using LLMs as Judges, what comes after MMLU, how prompts formatting sways benchmark results, and why leaderboards are GPU poor