Login
From:
Ai2 Blog
(Uncensored)
subscribe
Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
https://allenai.org/blog/contextualized-evaluations
links
backlinks
Roast topics
Find topics
Find it!
How do we evaluate LLMs on underspecified queries? We show that adding clarifying context flips model rankings and uncovers model biases.