Table of Contents
Fetching ...

p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

Runyan Tan, Shuang Wu, Phillip Howard

TL;DR

p-less sampling introduces a hyperparameter-free decoding strategy for LLMs by leveraging the full token distribution through a threshold based on the second moment $L[P] = \sum_v P(v)^2$, with a normalized variant $p$-less$_{norm}$ to encourage diversity. Grounded in Rényi entropy and information-theoretic principles, the method dynamically adapts truncation thresholds at each step, yielding robust performance as temperature increases and reducing unnecessary token admissions. Extensive experiments across math, logic, and creative-writing tasks show that p-less variants consistently outperform traditional truncation-based decoders and offer improved inference efficiency via reduced token-sampling overhead and shorter generations, all while maintaining or enhancing task accuracy. The approach demonstrates strong qualitative behavior (e.g., self-verification at high entropy) and favorable human evaluations, supporting practical adoption and suggesting broad applicability for robust, hyperparameter-free LLM decoding.

Abstract

Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation task and temperature configuration. In this work, we introduce $p$-less sampling: an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution. Unlike existing methods, $p$-less sampling has no hyperparameters and consistently produces high-quality outputs as temperature increases. We provide theoretical perspectives on $p$-less sampling to ground our proposed method and conduct experiments to empirically validate its effectiveness across a range of math, logical reasoning, and creative writing tasks. Our results demonstrate how $p$-less sampling consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values. We further show how $p$-less achieves greater inference-time efficiency than alternative methods through lower average token sampling times and shorter generation lengths, without sacrificing accuracy. Finally, we provide analyses to highlight the benefits of $p$-less through qualitative examples, case studies, and diversity assessments. The code is available at https://github.com/ryttry/p-less .

p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

TL;DR

p-less sampling introduces a hyperparameter-free decoding strategy for LLMs by leveraging the full token distribution through a threshold based on the second moment , with a normalized variant -less to encourage diversity. Grounded in Rényi entropy and information-theoretic principles, the method dynamically adapts truncation thresholds at each step, yielding robust performance as temperature increases and reducing unnecessary token admissions. Extensive experiments across math, logic, and creative-writing tasks show that p-less variants consistently outperform traditional truncation-based decoders and offer improved inference efficiency via reduced token-sampling overhead and shorter generations, all while maintaining or enhancing task accuracy. The approach demonstrates strong qualitative behavior (e.g., self-verification at high entropy) and favorable human evaluations, supporting practical adoption and suggesting broad applicability for robust, hyperparameter-free LLM decoding.

Abstract

Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation task and temperature configuration. In this work, we introduce -less sampling: an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution. Unlike existing methods, -less sampling has no hyperparameters and consistently produces high-quality outputs as temperature increases. We provide theoretical perspectives on -less sampling to ground our proposed method and conduct experiments to empirically validate its effectiveness across a range of math, logical reasoning, and creative writing tasks. Our results demonstrate how -less sampling consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values. We further show how -less achieves greater inference-time efficiency than alternative methods through lower average token sampling times and shorter generation lengths, without sacrificing accuracy. Finally, we provide analyses to highlight the benefits of -less through qualitative examples, case studies, and diversity assessments. The code is available at https://github.com/ryttry/p-less .

Paper Structure

This paper contains 50 sections, 20 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Comparison of truncation thresholds produced by $p\textrm{-less}$, min-$p$, and top-$p$ for a token probability distribution with different applied temperatures ($\tau$). As temperature increases, $p\textrm{-less}$ avoids admitting a large number of lower-likelihood tokens by considering the entropy of the distribution in computing the threshold.
  • Figure 2: Accuracy vs. temperature curves of each method on CSQA, QASC, and GSM8k using Llama-2-7b. AUC values achieved by each method are provided in the legend (in parentheses) with the best AUC in bold.
  • Figure 3: QASC accuracy vs. diversity
  • Figure 4: Step-wise entropy and number of admitted tokens for a GSM8K question answered with Llama3-70b.
  • Figure 5: Accuracy versus temperature curves of each method for the GPQA dataset using Llama2-7b. AUC values achieved by each method are provided in the legend (in parentheses) with the best AUC in bold.
  • ...and 5 more figures