Planning Like Human: A Dual-process Framework for Dialogue Planning

Tao He; Lizi Liao; Yixin Cao; Yuanxing Liu; Ming Liu; Zerui Chen; Bing Qin

Planning Like Human: A Dual-process Framework for Dialogue Planning

Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Ming Liu, Zerui Chen, Bing Qin

TL;DR

This work tackles the challenge of proactive dialogue planning by introducing DPDP, a dual-process framework that combines a fast Policy LM Planner with a slow Monte Carlo Tree Search (MCTS) Planner. A novel two-stage training regimen—offline reinforcement learning-based pretraining followed by MCTS-guided self-play—enables the policy model to achieve both efficiency and strategic depth. A nonparametric gating mechanism dynamically switches between planners based on the policy's uncertainty, balancing speed and planning rigor. Empirical results across ESConv, CIMA, and CraigslistBargain demonstrate that DPDP surpasses baselines in both dialogue quality and efficiency, while analyses of MCTS engagement and training dynamics provide practical guidance for deployment. Overall, the paper advances proactive, goal-directed dialogue systems by fusing cognitive-inspired planning with principled learning and search, offering a scalable path toward more capable conversational agents.

Abstract

In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to their reactive nature. Traditional approaches to enhance dialogue planning in LLMs, ranging from elaborate prompt engineering to the integration of policy networks, either face efficiency issues or deliver suboptimal performance. Inspired by the dualprocess theory in psychology, which identifies two distinct modes of thinking - intuitive (fast) and analytical (slow), we propose the Dual-Process Dialogue Planning (DPDP) framework. DPDP embodies this theory through two complementary planning systems: an instinctive policy model for familiar contexts and a deliberative Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. This dual strategy is further coupled with a novel two-stage training regimen: offline Reinforcement Learning for robust initial policy model formation followed by MCTS-enhanced on-the-fly learning, which ensures a dynamic balance between efficiency and strategic depth. Our empirical evaluations across diverse dialogue tasks affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.

Planning Like Human: A Dual-process Framework for Dialogue Planning

TL;DR

Abstract

Paper Structure (37 sections, 12 equations, 5 figures, 25 tables)

This paper contains 37 sections, 12 equations, 5 figures, 25 tables.

Introduction
Related Work
LLM-powered Dialogue Policy Planning
Applications of Dual-process Theory
Integrated Learning of RL and MCTS
Methodology
Preliminaries
Dual-process Planning Framework
Policy LM Planner
MCTS Planner
Synergizing Two Planners
Two-stage Training for Policy LM
Offline RL-based Pretraining
MCTS-guided Self-play Training
Experiments
...and 22 more sections

Figures (5)

Figure 1: Using dual-process theory for dialogue planning in the human cognitive process. This is a case from ESConv Liu2021TowardsES. "Question" and "Reflection of feelings" are pre-defined dialogue actions in ESConv.
Figure 2: DPDP combines two planners: (a) a Policy LM for quick responses and an MCTS planner for complex scenarios, switching when Policy LM is uncertain during inference. (b) We propose a two-stage training approach for Policy LM, involving pretraining over static data with offline RL and further finetuning with MCTS simulations.
Figure 3: Human evaluation results on ESConv.
Figure 4: Curves depicting the variation in LLM usage frequency and success rate on ESConv and CIMA as the proportion of MCTS is used.
Figure 5: Human evaluation results on CIMA.

Planning Like Human: A Dual-process Framework for Dialogue Planning

TL;DR

Abstract

Planning Like Human: A Dual-process Framework for Dialogue Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)