Allocate Marginal Reviews to Borderline Papers Using LLM Comparative Ranking
Elliot L. Epstein, Rajat Dwaraknath, John Winnicki, Thanawat Sornwanee
TL;DR
This work proposes using LLM-based pairwise comparisons to generate a comparative ranking and identify a borderline band around the acceptance cutoff before human reviews begin. By reallocating marginal reviews to papers within this band, the method aims to improve decision accuracy without changing final human decisions; the approach relies on a Bradley–Terry model to derive paper scores and defines key metrics $\rho$ (borderline overlap) and $\Delta$ (marginal review value). Empirical analysis on 1,000 ICLR 2025 submissions provides retrospective estimates for $\rho$ and $\Delta$, with ablations showing robustness to band settings and input scope, and a formal cost-benefit framing $ (\rho s - s^2) N \Delta $. The results suggest modest but reliable gains in correct decisions under a fixed extra-review budget, offering a practical, low-risk way to focus reviewer effort where it matters most.
Abstract
This paper argues that large ML conferences should allocate marginal review capacity primarily to papers near the acceptance boundary, rather than spreading extra reviews via random or affinity-driven heuristics. We propose using LLM-based comparative ranking (via pairwise comparisons and a Bradley--Terry model) to identify a borderline band \emph{before} human reviewing and to allocate \emph{marginal} reviewer capacity at assignment time. Concretely, given a venue-specific minimum review target (e.g., 3 or 4), we use this signal to decide which papers receive one additional review (e.g., a 4th or 5th), without conditioning on any human reviews and without using LLM outputs for accept/reject. We provide a simple expected-impact calculation in terms of (i) the overlap between the predicted and true borderline sets ($ρ$) and (ii) the incremental value of an extra review near the boundary ($Δ$), and we provide retrospective proxies to estimate these quantities.
