Enabling Language Models to Fill in the Blanks
Chris Donahue, Mina Lee, Percy Liang
TL;DR
This paper introduces infilling by language modeling (ILM), a simple framework that enables unidirectional LMs to predict variable-length missing spans by training on examples formed from the concatenation of masked text and the corresponding masked spans. By adding only a few tokens and a separator, ILM preserves LM benefits while leveraging context from both sides of a gap, achieving better sentence-level infilling than baselines and comparable performance to full-context models with reduced memory demands. Across domains—short stories, scientific abstracts, and song lyrics—ILM demonstrates strong quantitative performance (lower PPL on infilled spans) and favorable human judgments, indicating more natural, harder-to-detect machine-generated content. The approach, its granular masking strategy, and pretraining advantages suggest practical utility for writing assistance and co-creative AI tools, with publicly available demos and code to encourage broader adoption.
Abstract
We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to language modeling---a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of language models (LMs) to the more general task of infilling. To this end, we train (or fine-tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.
