Table of Contents
Fetching ...

Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation

Ning Dai, Jianze Liang, Xipeng Qiu, Xuanjing Huang

TL;DR

The paper tackles unpaired text style transfer without assuming a disentangled latent space. It introduces the Style Transformer, a Transformer-based encoder–decoder with a style embedding, trained with discriminator supervision to handle non-parallel data. The learning framework combines self- and cycle-reconstruction losses with style-controlling objectives from two discriminator architectures, achieving competitive content preservation and style control. Experiments on Yelp and IMDb demonstrate strong performance and robustness, with ablations highlighting the importance of all loss components. This approach enhances long-range dependency handling and avoids fixed latent-vector constraints, improving practical applicability for multi-style transfer scenarios.

Abstract

Disentangling the content and style in the latent space is prevalent in unpaired text style transfer. However, two major issues exist in most of the current neural models. 1) It is difficult to completely strip the style information from the semantics for a sentence. 2) The recurrent neural network (RNN) based encoder and decoder, mediated by the latent representation, cannot well deal with the issue of the long-term dependency, resulting in poor preservation of non-stylistic semantic content. In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation.

Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation

TL;DR

The paper tackles unpaired text style transfer without assuming a disentangled latent space. It introduces the Style Transformer, a Transformer-based encoder–decoder with a style embedding, trained with discriminator supervision to handle non-parallel data. The learning framework combines self- and cycle-reconstruction losses with style-controlling objectives from two discriminator architectures, achieving competitive content preservation and style control. Experiments on Yelp and IMDb demonstrate strong performance and robustness, with ablations highlighting the importance of all loss components. This approach enhances long-range dependency handling and avoids fixed latent-vector constraints, improving practical applicability for multi-style transfer scenarios.

Abstract

Disentangling the content and style in the latent space is prevalent in unpaired text style transfer. However, two major issues exist in most of the current neural models. 1) It is difficult to completely strip the style information from the semantics for a sentence. 2) The recurrent neural network (RNN) based encoder and decoder, mediated by the latent representation, cannot well deal with the issue of the long-term dependency, resulting in poor preservation of non-stylistic semantic content. In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation.

Paper Structure

This paper contains 25 sections, 9 equations, 2 figures, 5 tables, 3 algorithms.

Figures (2)

  • Figure 1: General illustration of previous models and our model. $\mathbf{z}$ denotes style-independent content vector and $\mathbf{s}$ denotes the style variable.
  • Figure 2: The training process for Style Transformer network. The input sentence $\mathbf{x}$ and input style $\mathbf{s}(\mathbf{\widehat{s}})$ is feed into Transformer network $f_{\theta}$. If the input style $\mathbf{s}$ is the same as the style of sentence $\mathbf{x}$, generated sentence $\mathbf{y}$ will be trained to reconstruct $\mathbf{x}$. Otherwise, the generated sentence $\mathbf{\widehat{y}}$ will be feed into Transformer $f_{\theta}$ and discriminator $d_{\phi}$ to reconstruct input sentence $\mathbf{x}$ and input style $\mathbf{\widehat{s}}$ respectively.