Login
From:
www.latent.space
(Uncensored)
subscribe
Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge
https://www.latent.space/p/benchmarks-201
links
backlinks
Clémentine Fourier of HuggingFace on why you should stop using LLMs as Judges, what comes after MMLU, how prompts formatting sways benchmark results, and why leaderboards are GPU poor
Roast topics
Find topics
Roast it!
Roast topics
Find topics
Find it!
Roast topics
Find topics
Find it!