ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

Vardhan Dongre; Xiaocheng Yang; Emre Can Acikgoz; Suvodip Dey; Gokhan Tur; Dilek Hakkani-Tür

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

Vardhan Dongre, Xiaocheng Yang, Emre Can Acikgoz, Suvodip Dey, Gokhan Tur, Dilek Hakkani-Tür

TL;DR

ReSpAct presents a unified framework that marries reasoning, dialogue, and action for task-oriented AI agents, extending ReAct with active dialogue actions to enable continuous user collaboration without fixed dialogue schemas. By formalizing an expanded action space and interleaving environment actions, language thoughts, and dialogue actions, ReSpAct leverages user feedback to refine plans in real time. Across AlfWorld, MultiWOZ, and WebShop, ReSpAct consistently improves task success and robustness compared to reasoning-only baselines, while offering insights into information symmetry and schema-guided dialogue. The work demonstrates the practical value and challenges of dynamic human-in-the-loop collaboration for building more effective and trustworthy conversational AI agents.

Abstract

Large language model (LLM)-based agents are increasingly employed to interact with external environments (e.g., games, APIs, world models) to solve user-provided tasks. However, current frameworks often lack the ability to collaborate effectively with users in fully conversational settings. Conversations are essential for aligning on task details, achieving user-defined goals, and satisfying preferences. While existing agents address ambiguity through clarification questions, they underutilize the broader potential of an LLM's conversational capabilities. In this work, we introduce ReSpAct, an LLM-based agent designed to seamlessly integrate reasoning, decision-making, and dynamic dialogue for task-solving. Expanding on reasoning-first approaches like ReAct, ReSpAct employs active, free-flowing dialogues to interpret instructions, clarify goals, provide status updates, resolve subtask failures, and refine plans based on user inputs without any explicit dialogue schema. By alternating between task-solving actions and interactive conversations, ReSpAct demonstrates improved performance across diverse environments. We evaluate ReSpAct in user-interactive settings, including task-oriented dialogue systems (MultiWOZ) and decision-making tasks (ALFWorld, WebShop). ReSpAct outperforms ReAct with absolute success rate improvements of 6% and 4% in ALFWorld and WebShop, respectively, and achieves a 5.5% gain in Inform and a 3% gain in Success scores in MultiWOZ. These results highlight the value of integrating dynamic user-agent collaboration for more effective task resolution.

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

TL;DR

Abstract

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)