Table of Contents
Fetching ...

Replacing Language Model for Style Transfer

Pengyu Cheng, Ruineng Li

TL;DR

Text style transfer under limited parallel data remains challenging. This paper introduces Replacing Language Model (RLM), which autoregressively generates target-style text by replacing each source token with a semantically equivalent span produced by a non-autoregressive masked LM, and enforces token-level style-content disentanglement via mutual-information objectives. The approach yields strong results on Yelp and Amazon in both automatic metrics (ACC, Ref-BLEU, Self-BLEU, GM) and human judgments, outperforming several baselines in overall transfer quality. The proposed RLM framework offers a flexible, fine-grained alternative to traditional sentence- and word-level methods and holds promise for broader sequence-to-sequence tasks.

Abstract

We introduce replacing language model (RLM), a sequence-to-sequence language modeling framework for text style transfer (TST). Our method autoregressively replaces each token of the source sentence with a text span that has a similar meaning but in the target style. The new span is generated via a non-autoregressive masked language model, which can better preserve the local-contextual meaning of the replaced token. This RLM generation scheme gathers the flexibility of autoregressive models and the accuracy of non-autoregressive models, which bridges the gap between sentence-level and word-level style transfer methods. To control the generation style more precisely, we conduct a token-level style-content disentanglement on the hidden representations of RLM. Empirical results on real-world text datasets demonstrate the effectiveness of RLM compared with other TST baselines. The code is at https://github.com/Linear95/RLM.

Replacing Language Model for Style Transfer

TL;DR

Text style transfer under limited parallel data remains challenging. This paper introduces Replacing Language Model (RLM), which autoregressively generates target-style text by replacing each source token with a semantically equivalent span produced by a non-autoregressive masked LM, and enforces token-level style-content disentanglement via mutual-information objectives. The approach yields strong results on Yelp and Amazon in both automatic metrics (ACC, Ref-BLEU, Self-BLEU, GM) and human judgments, outperforming several baselines in overall transfer quality. The proposed RLM framework offers a flexible, fine-grained alternative to traditional sentence- and word-level methods and holds promise for broader sequence-to-sequence tasks.

Abstract

We introduce replacing language model (RLM), a sequence-to-sequence language modeling framework for text style transfer (TST). Our method autoregressively replaces each token of the source sentence with a text span that has a similar meaning but in the target style. The new span is generated via a non-autoregressive masked language model, which can better preserve the local-contextual meaning of the replaced token. This RLM generation scheme gathers the flexibility of autoregressive models and the accuracy of non-autoregressive models, which bridges the gap between sentence-level and word-level style transfer methods. To control the generation style more precisely, we conduct a token-level style-content disentanglement on the hidden representations of RLM. Empirical results on real-world text datasets demonstrate the effectiveness of RLM compared with other TST baselines. The code is at https://github.com/Linear95/RLM.
Paper Structure (16 sections, 20 equations, 3 figures, 5 tables)

This paper contains 16 sections, 20 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Replacing language model (RLM) for equal-length transfer. (a) Prediction term: The generated ${\bm{Y}}_{0:i}$ and original ${\bm{X}}_{i+1: n}$ are combined with a [MASK] token and fed into the transformer-based encoder. RLM outputs content embedding ${\bm{c}}_i$ at [MASK] position, and fuse it with target style embedding ${\bm{s}}$ to predict the $i$-th token ${\bm{y}}_i$, providing $P({\bm{y}}_i | {\bm{Y}}_{0:i}, {\bm{X}}_{i+1:n}, {\bm{s}})$. Then ${\bm{y}}_i$ is inserted back at [MASK] position and ${\bm{x}}_{i+1}$ will be masked for the $(i+1)$-th generation setup. (b) Reconstruction term: The prediction candidate ${\bm{y}}_i$ is set in front of masked sequence ${\bm{X}}_{-i}$. RLM reconstructs original ${\bm{x}}_i$ based on content ${\bm{c}}_i$ at [MASK] position with the probability $p({\bm{x}}_i| {\bm{X}}_{0:i}, {\bm{y}}_i, {\bm{X}}_{i+1:n})$.
  • Figure 2: Unequal-length transfer. (a) Deletion: if token ${\bm{x}}_i$ is supposed to be deleted in the target sentence, RLM will output a [PAD] token at the [MASK] position. (b) Insertion: to insert a token in the target sentence, the next-token prediction head will output a [MASK] token. Then the generated ${\bm{y}}_{T_i}$ and new [MASK] tokens are inserted back into the input, and the next-step masked language model generation will be conducted on the generated [MASK] token, instead of masking ${\bm{x}}_{i+1}$.
  • Figure 3: (a) An example of sentence alignment. The blue sequence is the source and the green one is the target. Each $T_i$ points token ${\bm{x}}_i$ to the start position of the same-content transferred text span. (b) Fusion of style and content embeddings. The target style ${\bm{s}}$ is concatenated to every content ${\bm{c}}$ of input tokens. Then a self-attention is conducted on the combined embedding sequence. The attention results are added to the original content embedding with layer normalization.