Table of Contents
Fetching ...

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

Giorgio Franceschelli, Mirco Musolesi

TL;DR

DiffSampling addresses the decoding-time gap between high-quality yet less diverse outputs and diverse but potentially inaccurate generations by introducing a derivative-based truncation of the next-token distribution. It formalizes the cut point as the location of the minimum discrete derivative, and provides three variants—DiffSampling-cut, DiffSampling-lb, and DiffSampling-minp—that incorporate a lower bound or a dynamic upper bound on preserved tokens, with temperature applied after truncation to preserve guarantees. Across four tasks (math problem solving, extreme summarization, divergent association, and story writing), DiffSampling demonstrates competitive or superior accuracy relative to strong baselines and often yields greater diversity, especially in longer-form outputs. The work also shows that applying temperature after truncation maintains output quality while enabling broader stylistic variation, suggesting practical benefits for controllable and diverse text generation in real-world applications.

Abstract

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the most common strategies either consider only the most probable tokens, which reduces output diversity, or increase the likelihood of unlikely tokens, compromising output accuracy and correctness. In this paper, we propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution to ensure the generation of contextually appropriate text. In particular, the difference between consecutive, sorted probabilities can be used to truncate incorrect tokens. In addition, we also propose two variations of the proposed method that aim to correct the subtle inconsistencies of common sampling strategies. Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods it builds upon in terms of quality, while potentially improving output diversity.

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

TL;DR

DiffSampling addresses the decoding-time gap between high-quality yet less diverse outputs and diverse but potentially inaccurate generations by introducing a derivative-based truncation of the next-token distribution. It formalizes the cut point as the location of the minimum discrete derivative, and provides three variants—DiffSampling-cut, DiffSampling-lb, and DiffSampling-minp—that incorporate a lower bound or a dynamic upper bound on preserved tokens, with temperature applied after truncation to preserve guarantees. Across four tasks (math problem solving, extreme summarization, divergent association, and story writing), DiffSampling demonstrates competitive or superior accuracy relative to strong baselines and often yields greater diversity, especially in longer-form outputs. The work also shows that applying temperature after truncation maintains output quality while enabling broader stylistic variation, suggesting practical benefits for controllable and diverse text generation in real-world applications.

Abstract

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the most common strategies either consider only the most probable tokens, which reduces output diversity, or increase the likelihood of unlikely tokens, compromising output accuracy and correctness. In this paper, we propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution to ensure the generation of contextually appropriate text. In particular, the difference between consecutive, sorted probabilities can be used to truncate incorrect tokens. In addition, we also propose two variations of the proposed method that aim to correct the subtle inconsistencies of common sampling strategies. Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods it builds upon in terms of quality, while potentially improving output diversity.

Paper Structure

This paper contains 32 sections, 1 equation, 10 figures, 30 tables, 1 algorithm.

Figures (10)

  • Figure 1: In the top-left square, the original distribution. In the top-right square, DiffSampling-cut truncates after the minimum discrete derivative. In the bottom-left square, DiffSampling-lb also imposes a total probability lower bound $p_{lb} = 0.9$. In the bottom-right square, DiffSampling-minp applies truncation only among tokens with a probability less than $p_{min} = 0.1$ times the highest probability.
  • Figure 2: DAT scores for our methods and the baselines over the instructed (left) and pre-trained (right) model. Below, the number of valid outputs produced by each sampling strategy. The dashed line reports the greedy score.
  • Figure 3: Average quality scores across different temperature values for top-$p$, min-$p$, and our methods.
  • Figure 4: DAT scores for our methods and the baselines for the instructed (left) and pre-trained (right) model with different temperature values, together with the number of valid outputs produced by each sampling strategy. The dashed line represents the score of the greedy strategy.
  • Figure 5: DAT scores over all 10 nouns for our methods and the baselines for the instructed (left) and pre-trained (right) model with different temperature values, together with the number of valid outputs produced by each sampling strategy. The dashed line represents the score of the greedy strategy.
  • ...and 5 more figures