Table of Contents
Fetching ...

Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

Donghuo Zeng, Roberto S. Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, kun Zhang

TL;DR

The paper tackles dynamic persuasion by modeling user-specific latent personality dimensions (LPDs) and leveraging counterfactual reasoning to optimize dialogue. It introduces a DPPR model to estimate time-varying LPDs, uses BiCoGAN to generate counterfactual dialogue states conditioned on LPDs, and applies D3QN to learn policies from the counterfactual data. Empirical results on PersuasionForGood show that combining DPPR with BiCoGAN yields higher cumulative rewards and Q-values than both BiCoGAN alone and ground-truth baselines, demonstrating the value of personalized, counterfactual-aware policy learning in online interactions. This approach advances practical dialogue systems by enabling dynamic adaptation to evolving user traits and richer exploration through counterfactual scenarios, with potential impact on the effectiveness of persuasion technologies in real-world settings.

Abstract

Customizing persuasive conversations related to the outcome of interest for specific users achieves better persuasion results. However, existing persuasive conversation systems rely on persuasive strategies and encounter challenges in dynamically adjusting dialogues to suit the evolving states of individual users during interactions. This limitation restricts the system's ability to deliver flexible or dynamic conversations and achieve suboptimal persuasion outcomes. In this paper, we present a novel approach that tracks a user's latent personality dimensions (LPDs) during ongoing persuasion conversation and generates tailored counterfactual utterances based on these LPDs to optimize the overall persuasion outcome. In particular, our proposed method leverages a Bi-directional Generative Adversarial Network (BiCoGAN) in tandem with a Dialogue-based Personality Prediction Regression (DPPR) model to generate counterfactual data. This enables the system to formulate alternative persuasive utterances that are more suited to the user. Subsequently, we utilize the D3QN model to learn policies for optimized selection of system utterances on counterfactual data. Experimental results we obtained from using the PersuasionForGood dataset demonstrate the superiority of our approach over the existing method, BiCoGAN. The cumulative rewards and Q-values produced by our method surpass ground truth benchmarks, showcasing the efficacy of employing counterfactual reasoning and LPDs to optimize reinforcement learning policy in online interactions.

Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

TL;DR

The paper tackles dynamic persuasion by modeling user-specific latent personality dimensions (LPDs) and leveraging counterfactual reasoning to optimize dialogue. It introduces a DPPR model to estimate time-varying LPDs, uses BiCoGAN to generate counterfactual dialogue states conditioned on LPDs, and applies D3QN to learn policies from the counterfactual data. Empirical results on PersuasionForGood show that combining DPPR with BiCoGAN yields higher cumulative rewards and Q-values than both BiCoGAN alone and ground-truth baselines, demonstrating the value of personalized, counterfactual-aware policy learning in online interactions. This approach advances practical dialogue systems by enabling dynamic adaptation to evolving user traits and richer exploration through counterfactual scenarios, with potential impact on the effectiveness of persuasion technologies in real-world settings.

Abstract

Customizing persuasive conversations related to the outcome of interest for specific users achieves better persuasion results. However, existing persuasive conversation systems rely on persuasive strategies and encounter challenges in dynamically adjusting dialogues to suit the evolving states of individual users during interactions. This limitation restricts the system's ability to deliver flexible or dynamic conversations and achieve suboptimal persuasion outcomes. In this paper, we present a novel approach that tracks a user's latent personality dimensions (LPDs) during ongoing persuasion conversation and generates tailored counterfactual utterances based on these LPDs to optimize the overall persuasion outcome. In particular, our proposed method leverages a Bi-directional Generative Adversarial Network (BiCoGAN) in tandem with a Dialogue-based Personality Prediction Regression (DPPR) model to generate counterfactual data. This enables the system to formulate alternative persuasive utterances that are more suited to the user. Subsequently, we utilize the D3QN model to learn policies for optimized selection of system utterances on counterfactual data. Experimental results we obtained from using the PersuasionForGood dataset demonstrate the superiority of our approach over the existing method, BiCoGAN. The cumulative rewards and Q-values produced by our method surpass ground truth benchmarks, showcasing the efficacy of employing counterfactual reasoning and LPDs to optimize reinforcement learning policy in online interactions.
Paper Structure (14 sections, 6 equations, 10 figures, 2 tables)

This paper contains 14 sections, 6 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: The overview of our architecture.
  • Figure 2: The individualized transition dynamics model.
  • Figure 3: (a) Trained DPPR model, Generator $G$, Encoder $E$, and Discriminator $D$ in the training. (b) Trained Generator $G_{T}$ in counterfactual states generating.
  • Figure 4: An example (ID: $20180904\-154250\_98\_live$, donation: $2.0, OCEAN values: 3, 3.2, 3, 3.6, 3) persuasive dialogue between persuader (ER) and persuadee (EE) from PersuasionForGood dataset. Dynamic modeling of the dialogue, utterances of ER as actions (grey), utterance of EE as states (white).
  • Figure 5: The relationship between the counterfactual action $a^{'}_{t}$ and the next state: counterfactual case $s^{'}_{t+1}$ generated by BiCoGAN or ground truth $s_{t+1}$
  • ...and 5 more figures