Structuring Latent Spaces for Stylized Response Generation
Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan
TL;DR
The paper tackles stylized response generation in the absence of parallel data by introducing StyleFusion, a regularized multi-task framework that aligns a conversation model and non-parallel style data in a shared latent space. It extends SpaceFusion to non-parallel datasets, using fusion and smoothness objectives to connect latent spaces and enable controllable style through neighborhood sampling around the model prediction. Empirical results on arXiv-like and Holmes-like styles show StyleFusion achieves higher style strength without sacrificing relevance, validated by both automatic metrics and human judgments. The work demonstrates a practical path to controllable, style-aware dialogue generation leveraging non-conversational style corpora, with potential application to broader domains and data sources.
Abstract
Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.
