Table of Contents
Fetching ...

Exploring ReAct Prompting for Task-Oriented Dialogue: Insights and Shortcomings

Michelle Elizabeth, Morgan Veyret, Miguel Couceiro, Ondrej Dusek, Lina M. Rojas-Barahona

TL;DR

This work investigates applying ReAct prompting to task-oriented dialogue (TOD) by grounding LLMs to external tools for domain discovery, slot filling, and belief-state tracking in a MultiWOZ-based setup. Using GPT-3.5 and GPT-4, the authors compare ReAct-driven TOD against traditional baselines in both simulated and real-user evaluations, revealing that ReAct underperforms in simulation but yields higher user satisfaction in human interactions. Key contributions include a ReAct-augmented TOD framework with tool-based grounding, a cost-aware analysis of API usage, and qualitative insights into reasoning and grounding limitations. The findings highlight the potential of ReAct for more natural dialogue in TOD while underscoring the need for better control of reasoning traces and slot-state accuracy to close the gap with state-of-the-art baselines.

Abstract

Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) (Yao et al., 2022) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing task-oriented dialogue (TOD). We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs severely underperform state-of-the-art approaches on success rate in simulation, this difference becomes less pronounced in human evaluation. Moreover, compared to the baseline, humans report higher subjective satisfaction with ReAct-LLM despite its lower success rate, most likely thanks to its natural and confidently phrased responses.

Exploring ReAct Prompting for Task-Oriented Dialogue: Insights and Shortcomings

TL;DR

This work investigates applying ReAct prompting to task-oriented dialogue (TOD) by grounding LLMs to external tools for domain discovery, slot filling, and belief-state tracking in a MultiWOZ-based setup. Using GPT-3.5 and GPT-4, the authors compare ReAct-driven TOD against traditional baselines in both simulated and real-user evaluations, revealing that ReAct underperforms in simulation but yields higher user satisfaction in human interactions. Key contributions include a ReAct-augmented TOD framework with tool-based grounding, a cost-aware analysis of API usage, and qualitative insights into reasoning and grounding limitations. The findings highlight the potential of ReAct for more natural dialogue in TOD while underscoring the need for better control of reasoning traces and slot-state accuracy to close the gap with state-of-the-art baselines.

Abstract

Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) (Yao et al., 2022) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing task-oriented dialogue (TOD). We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs severely underperform state-of-the-art approaches on success rate in simulation, this difference becomes less pronounced in human evaluation. Moreover, compared to the baseline, humans report higher subjective satisfaction with ReAct-LLM despite its lower success rate, most likely thanks to its natural and confidently phrased responses.

Paper Structure

This paper contains 28 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The proposed ReAct-LLM system agent uses few-shot examples in the prompt to guide the LLM in decomposing reasoning into a sequence of thoughts, actions, and observations.
  • Figure 2: The ReAct prompt used to instruct the system LLM agent on how to perform task-oriented dialogue.
  • Figure 3: The example provided in the ReAct prompt showing the LLM the steps to be followed for performing TOD.
  • Figure 4: An excerpt of a conversation where the LLM shows creative ways to handle repeated user requests.
  • Figure 5: A full conversation for a simple goal.
  • ...and 6 more figures