Table of Contents
Fetching ...

Enabling Language Models to Fill in the Blanks

Chris Donahue, Mina Lee, Percy Liang

TL;DR

This paper introduces infilling by language modeling (ILM), a simple framework that enables unidirectional LMs to predict variable-length missing spans by training on examples formed from the concatenation of masked text and the corresponding masked spans. By adding only a few tokens and a separator, ILM preserves LM benefits while leveraging context from both sides of a gap, achieving better sentence-level infilling than baselines and comparable performance to full-context models with reduced memory demands. Across domains—short stories, scientific abstracts, and song lyrics—ILM demonstrates strong quantitative performance (lower PPL on infilled spans) and favorable human judgments, indicating more natural, harder-to-detect machine-generated content. The approach, its granular masking strategy, and pretraining advantages suggest practical utility for writing assistance and co-creative AI tools, with publicly available demos and code to encourage broader adoption.

Abstract

We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to language modeling---a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of language models (LMs) to the more general task of infilling. To this end, we train (or fine-tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.

Enabling Language Models to Fill in the Blanks

TL;DR

This paper introduces infilling by language modeling (ILM), a simple framework that enables unidirectional LMs to predict variable-length missing spans by training on examples formed from the concatenation of masked text and the corresponding masked spans. By adding only a few tokens and a separator, ILM preserves LM benefits while leveraging context from both sides of a gap, achieving better sentence-level infilling than baselines and comparable performance to full-context models with reduced memory demands. Across domains—short stories, scientific abstracts, and song lyrics—ILM demonstrates strong quantitative performance (lower PPL on infilled spans) and favorable human judgments, indicating more natural, harder-to-detect machine-generated content. The approach, its granular masking strategy, and pretraining advantages suggest practical utility for writing assistance and co-creative AI tools, with publicly available demos and code to encourage broader adoption.

Abstract

We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to language modeling---a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of language models (LMs) to the more general task of infilling. To this end, we train (or fine-tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.

Paper Structure

This paper contains 23 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: We consider the task of infilling, which takes incomplete text as input and outputs completed text. To tackle this task, our framework constructs training examples by masking random spans to generate pairs of inputs (text with blanks) and targets (answers for each blank). We then train unidirectional language models on the concatenation of each pair. Once trained, a model takes text input with blanks, predicts the answers, and then combines them to produce the output.
  • Figure 2: Training examples for three baseline infilling strategies and ILM on a given artificially-masked sentence. For each strategy, we train the same architecture (GPT-2) on such examples. At both training and test time, examples are fed from left to right; anything to the left of a green target is available to the model as context when predicting the target. Precisely, LM only considers past context, and LM-Rev only considers future. LM-All considers all available context but uses long sequence lengths. Our proposed ILM considers all context while using fewer tokens.
  • Figure 3: Example of a short story in our Stories dataset with its third sentence masked, and sentences infilled by different models. The sentences generated by BERT and SA models are off-topic, the sentence generated by LM model is irrelevant to the future context, while the ones generated by ILM and Human successfully account for both previous and future context.
  • Figure 4: Example of a task and instruction for human evaluation on Amazon Mechanical Turk.
  • Figure 5: Examples of sentence-level infills by different models.