Table of Contents
Fetching ...

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

Vardhan Dongre, Xiaocheng Yang, Emre Can Acikgoz, Suvodip Dey, Gokhan Tur, Dilek Hakkani-Tür

TL;DR

ReSpAct presents a unified framework that marries reasoning, dialogue, and action for task-oriented AI agents, extending ReAct with active dialogue actions to enable continuous user collaboration without fixed dialogue schemas. By formalizing an expanded action space and interleaving environment actions, language thoughts, and dialogue actions, ReSpAct leverages user feedback to refine plans in real time. Across AlfWorld, MultiWOZ, and WebShop, ReSpAct consistently improves task success and robustness compared to reasoning-only baselines, while offering insights into information symmetry and schema-guided dialogue. The work demonstrates the practical value and challenges of dynamic human-in-the-loop collaboration for building more effective and trustworthy conversational AI agents.

Abstract

Large language model (LLM)-based agents are increasingly employed to interact with external environments (e.g., games, APIs, world models) to solve user-provided tasks. However, current frameworks often lack the ability to collaborate effectively with users in fully conversational settings. Conversations are essential for aligning on task details, achieving user-defined goals, and satisfying preferences. While existing agents address ambiguity through clarification questions, they underutilize the broader potential of an LLM's conversational capabilities. In this work, we introduce ReSpAct, an LLM-based agent designed to seamlessly integrate reasoning, decision-making, and dynamic dialogue for task-solving. Expanding on reasoning-first approaches like ReAct, ReSpAct employs active, free-flowing dialogues to interpret instructions, clarify goals, provide status updates, resolve subtask failures, and refine plans based on user inputs without any explicit dialogue schema. By alternating between task-solving actions and interactive conversations, ReSpAct demonstrates improved performance across diverse environments. We evaluate ReSpAct in user-interactive settings, including task-oriented dialogue systems (MultiWOZ) and decision-making tasks (ALFWorld, WebShop). ReSpAct outperforms ReAct with absolute success rate improvements of 6% and 4% in ALFWorld and WebShop, respectively, and achieves a 5.5% gain in Inform and a 3% gain in Success scores in MultiWOZ. These results highlight the value of integrating dynamic user-agent collaboration for more effective task resolution.

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

TL;DR

ReSpAct presents a unified framework that marries reasoning, dialogue, and action for task-oriented AI agents, extending ReAct with active dialogue actions to enable continuous user collaboration without fixed dialogue schemas. By formalizing an expanded action space and interleaving environment actions, language thoughts, and dialogue actions, ReSpAct leverages user feedback to refine plans in real time. Across AlfWorld, MultiWOZ, and WebShop, ReSpAct consistently improves task success and robustness compared to reasoning-only baselines, while offering insights into information symmetry and schema-guided dialogue. The work demonstrates the practical value and challenges of dynamic human-in-the-loop collaboration for building more effective and trustworthy conversational AI agents.

Abstract

Large language model (LLM)-based agents are increasingly employed to interact with external environments (e.g., games, APIs, world models) to solve user-provided tasks. However, current frameworks often lack the ability to collaborate effectively with users in fully conversational settings. Conversations are essential for aligning on task details, achieving user-defined goals, and satisfying preferences. While existing agents address ambiguity through clarification questions, they underutilize the broader potential of an LLM's conversational capabilities. In this work, we introduce ReSpAct, an LLM-based agent designed to seamlessly integrate reasoning, decision-making, and dynamic dialogue for task-solving. Expanding on reasoning-first approaches like ReAct, ReSpAct employs active, free-flowing dialogues to interpret instructions, clarify goals, provide status updates, resolve subtask failures, and refine plans based on user inputs without any explicit dialogue schema. By alternating between task-solving actions and interactive conversations, ReSpAct demonstrates improved performance across diverse environments. We evaluate ReSpAct in user-interactive settings, including task-oriented dialogue systems (MultiWOZ) and decision-making tasks (ALFWorld, WebShop). ReSpAct outperforms ReAct with absolute success rate improvements of 6% and 4% in ALFWorld and WebShop, respectively, and achieves a 5.5% gain in Inform and a 3% gain in Success scores in MultiWOZ. These results highlight the value of integrating dynamic user-agent collaboration for more effective task resolution.

Paper Structure

This paper contains 38 sections, 10 figures, 25 tables.

Figures (10)

  • Figure 1: ReSpAct is a framework for task-oriented conversational agents that allows agents to ask questions, request feedback, and adapt their strategies based on user input.
  • Figure 2: Comparison of (a) ReAct and (b) ReSpAct to solve a game in AlfWorld shridhar2020alfworld. We show only the task-solving trajectories generated by the model (Act, Thought and Speech) and the environment (Obs).
  • Figure 3: Examples of the agent's communication approaches in AlfWorld: (a) seeking user guidance to refine its search strategy, (b) sharing status updates on task progress, and (c) soliciting user preferences to involve them in decision-making, thereby enhancing interaction and task alignment.
  • Figure 4: Examples of the agent's communication approaches in MultiWOZ: (a) seeking user guidance to refine its search strategy, (b) sharing status updates on task progress, and (c) soliciting user preferences to involve them in decision-making instead of making assumptions, thereby enhancing interaction and task alignment. Response here is a dense composition of Think and Speak actions.
  • Figure 5: Comparing action type distributions for ReAct (Left) and ReSpAct (Right) methods in AlfWorld. The figure illustrates how the two agents approach complex, embodied tasks in a simulated household environment, highlighting differences in their decision-making and interaction patterns.
  • ...and 5 more figures