Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

Qiwei Di; Kaixuan Ji; Xuheng Li; Heyang Zhao; Quanquan Gu

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

Qiwei Di, Kaixuan Ji, Xuheng Li, Heyang Zhao, Quanquan Gu

TL;DR

The paper investigates Pass@$k$ inference, where an agent generates $N$ candidates and submits up to $k$ of them. It shows that traditional MV and BoN strategies fail to scale optimally with $k$ and $N$, and introduces Best-of-Majority (BoM) as a minimax-optimal approach. BoM combines candidate-frequency filtering with reward-based selection to prove a regret bound $Regret(x) \le \epsilon_{opt}(x) + O\big(\sqrt{ C^*(x) \epsilon_{RM}^2(x) / k }\big)$ under $N = \tilde{\Theta}(C^*(x))$, achieving a matching general lower bound and scaling-monotonicity. Empirical results on math problem datasets demonstrate BoM's superiority over MV and BoN, supporting its practical utility for scalable, reliable Pass@$k$ inference.

Abstract

LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@$k$ inference setting, and prove that neither majority voting nor BoN exhibits the desirable scaling with $k$ and the sampling budget $N$. Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the $N$ samples before selecting the top-$k$ rewards. We prove that when the sampling budget is $N=\tildeΩ(C^*)$, the regret of BoM is $O(ε_{\mathrm{opt}}+\sqrt{ε_{\mathrm{RM}}^2C^*/k})$, where $C^*$ is the coverage coefficient, $ε_{\mathrm{RM}}$ is the estimation error of the reward model, and $ε_{\mathrm{opt}}$ is the estimation error of reward at the optimal response. We further establish a matching lower bound, certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a key advantage: unlike majority voting and BoN, its performance does not degrade when increasing $N$. Experimental results of inference on math problems show BoM outperforming both majority voting and BoN.

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

TL;DR

The paper investigates Pass@

inference, where an agent generates

candidates and submits up to

of them. It shows that traditional MV and BoN strategies fail to scale optimally with

and

, and introduces Best-of-Majority (BoM) as a minimax-optimal approach. BoM combines candidate-frequency filtering with reward-based selection to prove a regret bound

under

, achieving a matching general lower bound and scaling-monotonicity. Empirical results on math problem datasets demonstrate BoM's superiority over MV and BoN, supporting its practical utility for scalable, reliable Pass@

inference.

Abstract

: the agent may submit up to

responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@

inference setting, and prove that neither majority voting nor BoN exhibits the desirable scaling with

and the sampling budget

. Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the

samples before selecting the top-

rewards. We prove that when the sampling budget is

, the regret of BoM is

, where

is the coverage coefficient,

is the estimation error of the reward model, and

is the estimation error of reward at the optimal response. We further establish a matching lower bound, certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a key advantage: unlike majority voting and BoN, its performance does not degrade when increasing

. Experimental results of inference on math problems show BoM outperforming both majority voting and BoN.

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

TL;DR

Abstract

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)