Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders
Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
TL;DR
This paper addresses the dull-output problem of open-domain neural dialogs by modeling discourse-level diversity with a conditional variational autoencoder (CVAE). It introduces a knowledge-guided CVAE (kgCVAE) that injects linguistic priors and a bag-of-words (BOW) auxiliary loss to stabilize training and encourage latent usage, enabling diverse yet coherent responses even with greedy decoding. Empirical results on Switchboard show that CVAE and kgCVAE outperform a strong baseline, with kgCVAE achieving strong precision and recall across multiple metrics and enabling interpretable outputs such as predicted dialog acts. The work demonstrates the feasibility of capturing discourse-level variation in dialog generation and lays groundwork for data-driven dialog managers that leverage latent discourse factors.
Abstract
While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making.
