p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Runyan Tan, Shuang Wu, Phillip Howard
TL;DR
p-less sampling introduces a hyperparameter-free decoding strategy for LLMs by leveraging the full token distribution through a threshold based on the second moment $L[P] = \sum_v P(v)^2$, with a normalized variant $p$-less$_{norm}$ to encourage diversity. Grounded in Rényi entropy and information-theoretic principles, the method dynamically adapts truncation thresholds at each step, yielding robust performance as temperature increases and reducing unnecessary token admissions. Extensive experiments across math, logic, and creative-writing tasks show that p-less variants consistently outperform traditional truncation-based decoders and offer improved inference efficiency via reduced token-sampling overhead and shorter generations, all while maintaining or enhancing task accuracy. The approach demonstrates strong qualitative behavior (e.g., self-verification at high entropy) and favorable human evaluations, supporting practical adoption and suggesting broad applicability for robust, hyperparameter-free LLM decoding.
Abstract
Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation task and temperature configuration. In this work, we introduce $p$-less sampling: an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution. Unlike existing methods, $p$-less sampling has no hyperparameters and consistently produces high-quality outputs as temperature increases. We provide theoretical perspectives on $p$-less sampling to ground our proposed method and conduct experiments to empirically validate its effectiveness across a range of math, logical reasoning, and creative writing tasks. Our results demonstrate how $p$-less sampling consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values. We further show how $p$-less achieves greater inference-time efficiency than alternative methods through lower average token sampling times and shorter generation lengths, without sacrificing accuracy. Finally, we provide analyses to highlight the benefits of $p$-less through qualitative examples, case studies, and diversity assessments. The code is available at https://github.com/ryttry/p-less .
