Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

Sijin Chen; Omar Hagrass; Jason M. Klusowski

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

Sijin Chen, Omar Hagrass, Jason M. Klusowski

TL;DR

Decoding strategies for open-ended text generation exhibit a gap between MAP-like likelihood maximization and practical, diverse outputs. The authors formulate Decoding Game, a two-player zero-sum framework where Strategist seeks high true-distribution log-likelihood and Nature adversarially perturbs the distribution within a TV budget around the model, yielding a minimax objective $\max_{\mathbb{Q}}\min_{\mathbb{P}\in N(\widehat{\mathbb{P}})} \mathbb{E}_{\mathbb{Q}}[\log\mathbb{P}(X_1,\dots,X_T|X_0)]$. They derive that Nature induces an $\ell_\infty$-type regularization on the log-likelihood and show tail truncation-normalization as a first-order optimal strategy (with precise per-step conditions and thresholds), while also generalizing to other objective functions that recover temperature-based methods. The framework unifies and justifies popular heuristic sampling strategies and introduces Game sampling, which demonstrates competitive or superior open-ended generation performance across several models. Overall, the work provides a rigorous, minimal-assumption theory for decoding strategy design with practical, testable implications for robust text generation.

Abstract

Decoding strategies play a pivotal role in text generation for modern language models, yet a puzzling gap divides theory and practice. Surprisingly, strategies that should intuitively be optimal, such as Maximum a Posteriori (MAP), often perform poorly in practice. Meanwhile, popular heuristic approaches like Top-$k$ and Nucleus sampling, which employ truncation and normalization of the conditional next-token probabilities, have achieved great empirical success but lack theoretical justifications. In this paper, we propose Decoding Game, a comprehensive theoretical framework which reimagines text generation as a two-player zero-sum game between Strategist, who seeks to produce text credible in the true distribution, and Nature, who distorts the true distribution adversarially. After discussing the decomposibility of multi-step generation, we derive the optimal strategy in closed form for one-step Decoding Game. It is shown that the adversarial Nature imposes an implicit regularization on likelihood maximization, and truncation-normalization methods are first-order approximations to the optimal strategy under this regularization. Additionally, by generalizing the objective and parameters of Decoding Game, near-optimal strategies encompass diverse methods such as greedy search, temperature scaling, and hybrids thereof. Numerical experiments are conducted to complement our theoretical analysis.

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

TL;DR

. They derive that Nature induces an

-type regularization on the log-likelihood and show tail truncation-normalization as a first-order optimal strategy (with precise per-step conditions and thresholds), while also generalizing to other objective functions that recover temperature-based methods. The framework unifies and justifies popular heuristic sampling strategies and introduces Game sampling, which demonstrates competitive or superior open-ended generation performance across several models. Overall, the work provides a rigorous, minimal-assumption theory for decoding strategy design with practical, testable implications for robust text generation.

Abstract

and Nucleus sampling, which employ truncation and normalization of the conditional next-token probabilities, have achieved great empirical success but lack theoretical justifications. In this paper, we propose Decoding Game, a comprehensive theoretical framework which reimagines text generation as a two-player zero-sum game between Strategist, who seeks to produce text credible in the true distribution, and Nature, who distorts the true distribution adversarially. After discussing the decomposibility of multi-step generation, we derive the optimal strategy in closed form for one-step Decoding Game. It is shown that the adversarial Nature imposes an implicit regularization on likelihood maximization, and truncation-normalization methods are first-order approximations to the optimal strategy under this regularization. Additionally, by generalizing the objective and parameters of Decoding Game, near-optimal strategies encompass diverse methods such as greedy search, temperature scaling, and hybrids thereof. Numerical experiments are conducted to complement our theoretical analysis.

Paper Structure (21 sections, 6 theorems, 50 equations, 1 figure, 15 tables, 1 algorithm)

This paper contains 21 sections, 6 theorems, 50 equations, 1 figure, 15 tables, 1 algorithm.

Introduction
Motivation and our framework
Contribution
Related works
Existing theoretical interpretations
Text generation as decision making
Robust optimization and regularization
Formulation
Notations
Decoding Game
Reduction from multiple steps
Theoretical analysis
$p$-strategy: implicit regularization
$q$-strategy: heuristic sampling methods
Generalization from log-likelihood
...and 6 more sections

Key Result

Proposition 3.3

Given arbitrary $\widehat{\mathbb{P}}$ from the space of probability measures on $\mathcal{V}^{T}$, let $\mathbb{Q}=\mathbb{Q}(\widehat{\mathbb{P}})$ be any strategy with no foresight. Moreover, let $\mathbb{P}^*=\mathbb{P}^*(\widehat{\mathbb{P}},\mathbb{Q})$ be the optimal strategy of player N agai

Figures (1)

Figure 1: Next-token probability distribution in GPT-2 XL model and truncation threshold of Game sampling and Nucleus sampling.

Theorems & Definitions (7)

Definition 3.1
Proposition 3.3
Theorem 4.3
Theorem 4.4
Corollary 4.5
Theorem 4.7
Corollary 4.8

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

TL;DR

Abstract

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (7)