Top-$nσ$: Not All Logits Are You Need

Chenxia Tang; Jianchun Liu; Hongli Xu; Liusheng Huang

Top-$nσ$: Not All Logits Are You Need

Chenxia Tang, Jianchun Liu, Hongli Xu, Liusheng Huang

TL;DR

The extensive experimental results across four reasoning-focused datasets demonstrate that the novel sampling method, top-$n\sigma, not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.

Abstract

Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-$nσ$, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-$p$, min-$p$) that inadvertently include more noise tokens at higher temperatures, top-$nσ$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$nσ$ to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.

Top-$nσ$: Not All Logits Are You Need

TL;DR

Abstract

, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-

, min-

) that inadvertently include more noise tokens at higher temperatures, top-

maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-

to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.

Top-$nσ$: Not All Logits Are You Need

TL;DR

Abstract

Top-$nσ$: Not All Logits Are You Need

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)