Causal Discovery and Counterfactual Reasoning to Optimize Persuasive Dialogue Policies
Donghuo Zeng, Roberto Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, Kun Zhang
TL;DR
This work addresses adapting persuasive dialogue systems to dynamic user states by integrating causal discovery and counterfactual reasoning into reinforcement learning. The pipeline identifies causal relations between persuadee and persuader strategies using GRaSP, generates plausible counterfactual utterances with BiCoGAN guided by these relations, and learns optimal dialogue policies via Dueling Double Deep Q-Networks trained on counterfactual data. Empirical results on the PersuasionForGood dataset show that the causality-guided counterfactual approach yields higher Q-values and greater predicted donations than baselines, demonstrating the value of causal structure in generating effective counterfactuals and informing policy. The study highlights the potential of combining causal inference with counterfactual data to dynamically tailor persuasive interactions, with implications for marketing, health communication, and social-good initiatives, while noting the need for real-user studies and broader personalization in future work.
Abstract
Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) algorithm to identify causal relationships between user and system utterance strategies, treating user strategies as states and system strategies as actions. GRaSP identifies user strategies as causal factors influencing system responses, which inform Bidirectional Conditional Generative Adversarial Networks (BiCoGAN) in generating counterfactual utterances for the system. Subsequently, we use the Dueling Double Deep Q-Network (D3QN) model to utilize counterfactual data to determine the best policy for selecting system utterances. Our experiments with the PersuasionForGood dataset show measurable improvements in persuasion outcomes using our approach over baseline methods. The observed increase in cumulative rewards and Q-values highlights the effectiveness of causal discovery in enhancing counterfactual reasoning and optimizing reinforcement learning policies for online dialogue systems.
