On the Efficacy of Sampling Adapters
Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell
TL;DR
This paper formalizes sampling adapters as simple, plug-in modifications to the per-step conditional distributions of autoregressive language models, unifying common decoding tricks under a single framework. It argues that adapters enact a precision–recall trade-off: they often reduce the model's ability to generate certain strings (lower recall) but improve the likelihood of producing high-quality text (higher precision), aligning with sequence-level quality as measured by Mauve when tuned. Through analyses using reverse cross-entropy, reverse KL divergence, and balanced metrics like TVD/JS, the authors show precision-emphasizing measures correlate with improved text quality, suggesting practical guidance for adapter hyperparameter selection. The work highlights that standard training objectives may misalign with generation goals and that precision-focused measures can serve as efficient proxies for steering decoding choices in open-ended generation settings.
Abstract
Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, such as nucleus or top-k sampling, have been introduced and are now ubiquitously used in language generation systems. We propose a unified framework for understanding these techniques, which we term sampling adapters. Sampling adapters often lead to qualitatively better text, which raises the question: From a formal perspective, how are they changing the (sub)word-level distributions of language generation models? And why do these local changes lead to higher-quality text? We argue that the shift they enforce can be viewed as a trade-off between precision and recall: while the model loses its ability to produce certain strings, its precision rate on desirable text increases. While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution. Further, these measures correlate with higher sequence-level quality scores, specifically, Mauve.
