Most commonly, we hear about evaluating LLM applications at the span level. This involves checking whether a tool call succeeded, whether an LLM hallucinated, or whether a response matched expectations.... The post Trace-Level LLM Evaluations with Arize AX appeared first on Arize AI.