Last Updated on September 4, 2025 by Editorial Team Author(s): Katherine Munro Originally published on Towards AI. Concrete advice for teams building LLM-powered evaluations My last post was all about conceptual problems with using Large Language Models to judge other LLMs. All images: Author provided.The article discusses the practical challenges of using Large Language Models (LLMs) as judges in evaluations, highlighting issues such as non-determinism in both the LLMs being evaluated and th...