Table of Contents
Fetching ...

Structuring Latent Spaces for Stylized Response Generation

Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan

TL;DR

The paper tackles stylized response generation in the absence of parallel data by introducing StyleFusion, a regularized multi-task framework that aligns a conversation model and non-parallel style data in a shared latent space. It extends SpaceFusion to non-parallel datasets, using fusion and smoothness objectives to connect latent spaces and enable controllable style through neighborhood sampling around the model prediction. Empirical results on arXiv-like and Holmes-like styles show StyleFusion achieves higher style strength without sacrificing relevance, validated by both automatic metrics and human judgments. The work demonstrates a practical path to controllable, style-aware dialogue generation leveraging non-conversational style corpora, with potential application to broader domains and data sources.

Abstract

Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

Structuring Latent Spaces for Stylized Response Generation

TL;DR

The paper tackles stylized response generation in the absence of parallel data by introducing StyleFusion, a regularized multi-task framework that aligns a conversation model and non-parallel style data in a shared latent space. It extends SpaceFusion to non-parallel datasets, using fusion and smoothness objectives to connect latent spaces and enable controllable style through neighborhood sampling around the model prediction. Empirical results on arXiv-like and Holmes-like styles show StyleFusion achieves higher style strength without sacrificing relevance, validated by both automatic metrics and human judgments. The work demonstrates a practical path to controllable, style-aware dialogue generation leveraging non-conversational style corpora, with potential application to broader domains and data sources.

Abstract

Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

Paper Structure

This paper contains 28 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: StyleFusion helps conversational model to distill style from non-conversational, non-parallel sentences by mapping them to points surrounding the related conversations in the structured latent space. Direction and distance from the model prediction (center black dot) roughly correspond to contents and style intensity, respectively, illustrated by examples taken from Table \ref{['table:example_towards']}.
  • Figure 2: StyleFusion model architecture.
  • Figure 3: Change of the overall style intensity with $\rho$, as measured by two classifiers "neural" and "ngram" (Section \ref{['sec:infer']}) and the "count" metric. The "count" metric is normalized by the value of the target style corpus. The barplot shows the desired trend (from Reddit to arXiv or Holmes), and the lines the actual trends.
  • Figure 4: Change of the styles at finer granularity of $\rho$, measured by the "count" metric normalized by the value of Reddit dataset. The barplot shows the desired trend (from Reddit to arXiv or Holmes), and the lines the actual trends.
  • Figure 5: Relevancy of the StyleFusion outputs at different $\rho$ as measured by BLEU4 with references of different styles
  • ...and 1 more figures