Large language models (LLMs) that have been optimized through human feedback have rapidly emerged as a leading paradigm for developing intelligent conversational agents. However, despite their strong performance across many benchmarks, LLM-based agents can still lack multi-turn conversational skills such as disambiguation — when they are faced with ambiguity, they often overhedge or implicitly guess users' true intents rather than asking clarifying questions. Yet high-quality conversation s...