I previously tried (and failed) to setup LLM tracing for hinbox using Arize Phoenix and litellm. Since this is sort of a priority for being able to follow along with the Hamel / Shreya evals course with my practical application, I’ll take another stab using a tool with which I’m familiar: Braintrust. Let’s start simple and then if it works the way we want we can set things up for hinbox as well. Simple Braintrust tracing with litellm callbacks Callbacks are listed in the litellm docs as...| Alex Strick van Linschoten
It’s important to instrument your AI applications! I hope this can more or less be taken as given just as you’d expect a non-AI-infused app to capture logs. When you’re evaluating your LLM-powered system, you need to have capture the inputs and outputs both at an end-to-end level in terms of the way the user experiences things as well as with more fine-grained granularity for all the internal workings. My goal with this blog is to first demonstrate how Phoenix and litellm can work toget...| Alex Strick van Linschoten
I’ve been working on a project called hinbox - a flexible entity extraction system designed to help historians and researchers build structured knowledge databases from collections of primary source documents. At its core, hinbox processes historical documents, academic papers, books and news articles to automatically extract and organize information about people, organizations, locations, and events. The tool works by ingesting batches of documents and intelligently identifying entities ac...| Alex Strick van Linschoten
I came across this quote in a happy coincidence after attending the second session of the evals course: It’s obviously a bit abstract, but I thought it was a nice oblique reflection on the topic being discussed. Both the main session and the office hours were mostly focused on the first part of the analyse-measure-improve loop that was introduced earlier in the week. Focus on the ‘analyse’ part of the LLM application improvement loop It was a very practical session in which we even took...| Alex Strick van Linschoten
Key insights from the first session of the Hamel/Shreya AI Evals course, focusing on a 'three gulfs' mental model (specification, generalisation, and comprehension) for LLM application development…| mlops.systems