Table of Contents
Fetching ...

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

Qiwei Di, Kaixuan Ji, Xuheng Li, Heyang Zhao, Quanquan Gu

TL;DR

The paper investigates Pass@$k$ inference, where an agent generates $N$ candidates and submits up to $k$ of them. It shows that traditional MV and BoN strategies fail to scale optimally with $k$ and $N$, and introduces Best-of-Majority (BoM) as a minimax-optimal approach. BoM combines candidate-frequency filtering with reward-based selection to prove a regret bound $Regret(x) \le \epsilon_{opt}(x) + O\big(\sqrt{ C^*(x) \epsilon_{RM}^2(x) / k }\big)$ under $N = \tilde{\Theta}(C^*(x))$, achieving a matching general lower bound and scaling-monotonicity. Empirical results on math problem datasets demonstrate BoM's superiority over MV and BoN, supporting its practical utility for scalable, reliable Pass@$k$ inference.

Abstract

LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@$k$ inference setting, and prove that neither majority voting nor BoN exhibits the desirable scaling with $k$ and the sampling budget $N$. Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the $N$ samples before selecting the top-$k$ rewards. We prove that when the sampling budget is $N=\tildeΩ(C^*)$, the regret of BoM is $O(ε_{\mathrm{opt}}+\sqrt{ε_{\mathrm{RM}}^2C^*/k})$, where $C^*$ is the coverage coefficient, $ε_{\mathrm{RM}}$ is the estimation error of the reward model, and $ε_{\mathrm{opt}}$ is the estimation error of reward at the optimal response. We further establish a matching lower bound, certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a key advantage: unlike majority voting and BoN, its performance does not degrade when increasing $N$. Experimental results of inference on math problems show BoM outperforming both majority voting and BoN.

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

TL;DR

The paper investigates Pass@ inference, where an agent generates candidates and submits up to of them. It shows that traditional MV and BoN strategies fail to scale optimally with and , and introduces Best-of-Majority (BoM) as a minimax-optimal approach. BoM combines candidate-frequency filtering with reward-based selection to prove a regret bound under , achieving a matching general lower bound and scaling-monotonicity. Empirical results on math problem datasets demonstrate BoM's superiority over MV and BoN, supporting its practical utility for scalable, reliable Pass@ inference.

Abstract

LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@: the agent may submit up to responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@ inference setting, and prove that neither majority voting nor BoN exhibits the desirable scaling with and the sampling budget . Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the samples before selecting the top- rewards. We prove that when the sampling budget is , the regret of BoM is , where is the coverage coefficient, is the estimation error of the reward model, and is the estimation error of reward at the optimal response. We further establish a matching lower bound, certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a key advantage: unlike majority voting and BoN, its performance does not degrade when increasing . Experimental results of inference on math problems show BoM outperforming both majority voting and BoN.

Paper Structure

This paper contains 19 sections, 8 theorems, 85 equations, 3 figures, 1 table, 3 algorithms.

Key Result

Theorem 4.1

For the (weighted) majority voting Algorithm alg:WM with weight function $w(\cdot)$, assume that $C^*(x) \ge 1 + 2kw(1)/w(1/2)$. Then, there exists an instance $\mathcal{I}=(\mathcal{X}, \mathcal{Y}, \pi^*, r^*, \pi_{\text{ref}}, \widehat{r})$ such that the coverage coefficient is $C^*(x)$, and $\wi

Figures (3)

  • Figure 1: Results with different $k$ on Qwen3-4B. BoM consistently outperforms the baselines on MATH-500 for all $k$ and on AIME24, GSM8K when $k$ is small, and matches the performance of baselines in other settings.
  • Figure 2: The results of different $k$ with $N=500$ on Qwen2.5-1.5B.
  • Figure 3: The results with fixed $k$ and different $N$. When $N$ increases, the performance of BoN is likely to decrease over all the $k$. The performance of Majority voting remains at a low level. Among them, BoM has a more consistent performance and outperforms baselines with larger $N$.

Theorems & Definitions (11)

  • Remark 3.3
  • Definition 3.4
  • Theorem 4.1
  • Theorem 4.2
  • Remark 4.3
  • Theorem 5.1
  • Corollary 5.2
  • Theorem 6.1
  • Lemma A.1
  • Lemma B.1
  • ...and 1 more