Table of Contents
Fetching ...

Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation

Damien de Mijolla, Hannan Saddiq, Kim Moore

TL;DR

This work tackles the problem of output inconsistency in autoregressive language models by introducing Gumbel Consistency Sampling (GCS), a decoding method that couples multiple responses through a shared latent variable drawn via the Gumbel reparameterization, thereby increasing cross-sample similarity without altering marginal probabilities. A variant, GCS with Recycling (GCSwR), reuses Gumbel noise across samples to strengthen inter-sample correlations, and is combined with semantically-driven prompt ensembling to further reduce prompt-induced variation; all approaches are training-free and compatible with existing samplers. Empirical results show semantic similarity gains of up to about 10% over baselines across several models, along with notable stylistic consistency improvements, while maintaining output quality and incurring negligible computational overhead. The method offers a practical, low-cost path to more reliable NLG, with future work exploring localized latent correlations and learnable noise parameters to balance consistency, diversity, and safety.

Abstract

Consistency in the output of language models is critical for their reliability and practical utility. Due to their training objective, language models learn to model the full space of possible continuations, leading to outputs that can vary significantly in style and content, even for similar or repeated inputs. To address this, we propose a novel decoding algorithm that enhances response consistency across different prompts with no degradation in response quality. By incorporating a latent variable into the next-token sampling process based on the Gumbel reparametrisation trick, our method outperforms standard sampling by up to 10% across semantic and stylistic consistency benchmarks. Additionally, our approach integrates seamlessly with existing sampling methods with negligible computational overhead, providing a practical solution for improving the reliability of language model outputs.

Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation

TL;DR

This work tackles the problem of output inconsistency in autoregressive language models by introducing Gumbel Consistency Sampling (GCS), a decoding method that couples multiple responses through a shared latent variable drawn via the Gumbel reparameterization, thereby increasing cross-sample similarity without altering marginal probabilities. A variant, GCS with Recycling (GCSwR), reuses Gumbel noise across samples to strengthen inter-sample correlations, and is combined with semantically-driven prompt ensembling to further reduce prompt-induced variation; all approaches are training-free and compatible with existing samplers. Empirical results show semantic similarity gains of up to about 10% over baselines across several models, along with notable stylistic consistency improvements, while maintaining output quality and incurring negligible computational overhead. The method offers a practical, low-cost path to more reliable NLG, with future work exploring localized latent correlations and learnable noise parameters to balance consistency, diversity, and safety.

Abstract

Consistency in the output of language models is critical for their reliability and practical utility. Due to their training objective, language models learn to model the full space of possible continuations, leading to outputs that can vary significantly in style and content, even for similar or repeated inputs. To address this, we propose a novel decoding algorithm that enhances response consistency across different prompts with no degradation in response quality. By incorporating a latent variable into the next-token sampling process based on the Gumbel reparametrisation trick, our method outperforms standard sampling by up to 10% across semantic and stylistic consistency benchmarks. Additionally, our approach integrates seamlessly with existing sampling methods with negligible computational overhead, providing a practical solution for improving the reliability of language model outputs.

Paper Structure

This paper contains 32 sections, 6 theorems, 55 equations, 2 figures, 8 tables, 1 algorithm.

Key Result

Theorem 4.1

Suppose we have two different categorical distributions parametrized by $p^1,...,p^{N_v}$ and $q^1,...,q^{N_v}$. Define a joint distribution over pairs of categories $(Y,V)$ by defining where $g^1,...,g^{N_v}\sim \text{G}(0,1)$ are independent. We have that

Figures (2)

  • Figure 1: Motivating toy example highlighting the aim of our approach. Even when language models yield similar probability distributions over responses (top), responses sampled independently (bottom left) can be inconsistent or contradictory due to the inherent stochasticity of sampling. By generating responses in a correlated manner (bottom right) it is possible to alleviate inconsistencies across responses while still respecting the marginal probabilities of each response. In this paper we propose, Gumbel Consistent Sampling, an approach for increasing response consistency through drawing correlated responses, by conditioning all responses on a shared latent variable, that is robust to differences between probability distributions over responses.
  • Figure 2: Mean semantic consistency between responses to paraphrased questions as a function of temperature, comparing independent sampling (IS) against Gumbel Consistency Sampling with Recycling (GCSwR).

Theorems & Definitions (12)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem
  • proof
  • Theorem
  • proof
  • Lemma B.1
  • proof
  • Claim
  • proof
  • ...and 2 more