DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation
Giorgio Franceschelli, Mirco Musolesi
TL;DR
DiffSampling addresses the decoding-time gap between high-quality yet less diverse outputs and diverse but potentially inaccurate generations by introducing a derivative-based truncation of the next-token distribution. It formalizes the cut point as the location of the minimum discrete derivative, and provides three variants—DiffSampling-cut, DiffSampling-lb, and DiffSampling-minp—that incorporate a lower bound or a dynamic upper bound on preserved tokens, with temperature applied after truncation to preserve guarantees. Across four tasks (math problem solving, extreme summarization, divergent association, and story writing), DiffSampling demonstrates competitive or superior accuracy relative to strong baselines and often yields greater diversity, especially in longer-form outputs. The work also shows that applying temperature after truncation maintains output quality while enabling broader stylistic variation, suggesting practical benefits for controllable and diverse text generation in real-world applications.
Abstract
Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the most common strategies either consider only the most probable tokens, which reduces output diversity, or increase the likelihood of unlikely tokens, compromising output accuracy and correctness. In this paper, we propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution to ensure the generation of contextually appropriate text. In particular, the difference between consecutive, sorted probabilities can be used to truncate incorrect tokens. In addition, we also propose two variations of the proposed method that aim to correct the subtle inconsistencies of common sampling strategies. Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods it builds upon in terms of quality, while potentially improving output diversity.
