When evaluating AI applications, we often look at things like tool calls, parameters, or individual model responses. While this span-level evaluation is useful, it doesn’t always capture the bigger picture...| Arize AI