Recently, I wrote about why you should write your own benchmarks for language models to see how they work for your app. I also shared a ready-to-use Jupyter Notebook that allows you to evaluate language models on Ollama. I've just published a new version of the notebook which now supports any language model host that exposes OpenAI-compatible APIs. Like the previous version, the new notebook shows you how well your select language models perform for each scenario, and overall. What's changed,...