Exploring ReAct Prompting for Task-Oriented Dialogue: Insights and Shortcomings
Michelle Elizabeth, Morgan Veyret, Miguel Couceiro, Ondrej Dusek, Lina M. Rojas-Barahona
TL;DR
This work investigates applying ReAct prompting to task-oriented dialogue (TOD) by grounding LLMs to external tools for domain discovery, slot filling, and belief-state tracking in a MultiWOZ-based setup. Using GPT-3.5 and GPT-4, the authors compare ReAct-driven TOD against traditional baselines in both simulated and real-user evaluations, revealing that ReAct underperforms in simulation but yields higher user satisfaction in human interactions. Key contributions include a ReAct-augmented TOD framework with tool-based grounding, a cost-aware analysis of API usage, and qualitative insights into reasoning and grounding limitations. The findings highlight the potential of ReAct for more natural dialogue in TOD while underscoring the need for better control of reasoning traces and slot-state accuracy to close the gap with state-of-the-art baselines.
Abstract
Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) (Yao et al., 2022) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing task-oriented dialogue (TOD). We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs severely underperform state-of-the-art approaches on success rate in simulation, this difference becomes less pronounced in human evaluation. Moreover, compared to the baseline, humans report higher subjective satisfaction with ReAct-LLM despite its lower success rate, most likely thanks to its natural and confidently phrased responses.
