Table of Contents
Fetching ...

DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors

Tazeek Bin Abdur Rakib, Ambuj Mehrish, Lay-Ki Soon, Wern Han Lim, Soujanya Poria

TL;DR

DialogXpert tackles the challenge of proactive, goal-directed dialogue by combining a frozen LLM action proposer with a lightweight Q-network and an emotion-tracking module. The approach uses a two-step LLM prior (free-form generation and projection) to produce a compact candidate-action set, which the Q-network evaluates to select optimal moves, guided by rewards from a critic LLM. Emotion awareness is integrated into state representation to balance task progress with rapport, enabling faster-than-typical negotiations, tutoring, and emotional support across multiple datasets. Empirical results show sub-3-turn conversations with high success rates (often >0.94, rising above 0.97 with larger priors) and competitive negotiation quality, while achieving substantial efficiency gains over MCTS-based planners and fine-tuned policy models. The framework demonstrates practical real-time planning at scale and establishes a path for broader, emotionally intelligent dialogue systems with minimal training overhead.

Abstract

Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user's emotions, DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under $3$ turns with success rates exceeding 94\% and, with a larger LLM prior, pushes success above 97\% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale. Code available at https://github.com/declare-lab/dialogxpert/

DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors

TL;DR

DialogXpert tackles the challenge of proactive, goal-directed dialogue by combining a frozen LLM action proposer with a lightweight Q-network and an emotion-tracking module. The approach uses a two-step LLM prior (free-form generation and projection) to produce a compact candidate-action set, which the Q-network evaluates to select optimal moves, guided by rewards from a critic LLM. Emotion awareness is integrated into state representation to balance task progress with rapport, enabling faster-than-typical negotiations, tutoring, and emotional support across multiple datasets. Empirical results show sub-3-turn conversations with high success rates (often >0.94, rising above 0.97 with larger priors) and competitive negotiation quality, while achieving substantial efficiency gains over MCTS-based planners and fine-tuned policy models. The framework demonstrates practical real-time planning at scale and establishes a path for broader, emotionally intelligent dialogue systems with minimal training overhead.

Abstract

Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user's emotions, DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under turns with success rates exceeding 94\% and, with a larger LLM prior, pushes success above 97\% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale. Code available at https://github.com/declare-lab/dialogxpert/

Paper Structure

This paper contains 49 sections, 11 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: DialogXpert pipeline: case information and dialogue history drive user/system LLMs and an emotion tracker; a frozen LLM generates a prior over candidate actions, the top-k are evaluated by a Q-network and executed by the system LLM; a critic LLM provides reward signals to train the Q-network.
  • Figure 2: Exploration vs. Exploitation: We use the Qwen 2.5 14B prior with top-$k=4$ and sweep the $\epsilon$-greedy parameter ($\epsilon$) to measure how different exploration rates affect average turns, success rate, and SL Average.
  • Figure 3: Win/tie/loss percentages for DialogXpert vs. PPDPP on ESConv across Identification, Comforting, Suggestion and Overall metrics.
  • Figure 4: Win/tie/loss percentages for DialogXpert vs. PPDPP on the CIMA tutoring dataset across Hint, Identification and Overall metrics.