Clémentine Fourier of HuggingFace on why you should stop using LLMs as Judges, what comes after MMLU, how prompts formatting sways benchmark results, and why leaderboards are GPU poor| www.latent.space