Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun
TL;DR
The paper tackles the problem that current language-model-driven agents struggle to elicit and align with users' implicit intentions due to vague instructions. It introduces IN3, a benchmark to assess task vagueness, missing details, and user-intention summaries, and proposes an upstream interaction expert, exemplified by Mistral-Interact, trained on IN3 data. Through integration with the XAgent framework, the study demonstrates improved instruction understanding and execution efficiency, including better vagueness judgments, higher recovery of critical details, and reduced unnecessary tool usage, approaching GPT-4 performance with a smaller open-source model. The work highlights the value of user participation in agent design and provides a scalable path toward robust implicit intention understanding using open-source model experts, with broad implications for evaluation benchmarks and future human-agent interaction research.
Abstract
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.
