Table of Contents
Fetching ...

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

TL;DR

The paper tackles the problem that current language-model-driven agents struggle to elicit and align with users' implicit intentions due to vague instructions. It introduces IN3, a benchmark to assess task vagueness, missing details, and user-intention summaries, and proposes an upstream interaction expert, exemplified by Mistral-Interact, trained on IN3 data. Through integration with the XAgent framework, the study demonstrates improved instruction understanding and execution efficiency, including better vagueness judgments, higher recovery of critical details, and reduced unnecessary tool usage, approaching GPT-4 performance with a smaller open-source model. The work highlights the value of user participation in agent design and provides a scalable path toward robust implicit intention understanding using open-source model experts, with broad implications for evaluation benchmarks and future human-agent interaction research.

Abstract

Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

TL;DR

The paper tackles the problem that current language-model-driven agents struggle to elicit and align with users' implicit intentions due to vague instructions. It introduces IN3, a benchmark to assess task vagueness, missing details, and user-intention summaries, and proposes an upstream interaction expert, exemplified by Mistral-Interact, trained on IN3 data. Through integration with the XAgent framework, the study demonstrates improved instruction understanding and execution efficiency, including better vagueness judgments, higher recovery of critical details, and reduced unnecessary tool usage, approaching GPT-4 performance with a smaller open-source model. The work highlights the value of user participation in agent design and provides a scalable path toward robust implicit intention understanding using open-source model experts, with broad implications for evaluation benchmarks and future human-agent interaction research.

Abstract

Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.
Paper Structure (74 sections, 6 figures, 5 tables)

This paper contains 74 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: A comparison of agent execution with implicit intentions or explicit intentions after user-agent interaction.
  • Figure 2: An illustration of IN3's formation with an example data point.
  • Figure 3: The construction of conversation records with diverse strategies applied.
  • Figure 4: Case studies of model-user interactions under different scenarios to show Mistral-Interact's robustness.
  • Figure 5: Case study on the agent execution process before and after interaction with Mistral-Interact in agent design.
  • ...and 1 more figures