RBCorr: Response Bias Correction in Language Models
Om Bhatt, Anna A. Ivanova
TL;DR
This work tackles response bias in fixed-response evaluations of language models by introducing RBCorr, a low-cost, LogProbs-based calibration that debiases outputs with a small held-out, class-balanced calibration set. By extracting last-layer LogProbs, measuring bias with Total Variation Distance, and applying mean-normalization per option, RBCorr reduces bias while preserving or improving accuracy across diverse yes-no, entailment, and multi-choice tasks. Comparative analyses against Contextual Calibration, Batch Calibration, and PriDe show RBCorr often yields the largest bias reductions and substantial accuracy gains, particularly for smaller models, though bias patterns are highly dependent on model, dataset, and prompt configuration, limiting transferability. These findings support using calibrated LogProbs as both a practical evaluation tool and a means to reveal latent capabilities in LM benchmarks, while acknowledging limitations to open-weight models and the need for condition-specific calibration. The approach has practical implications for fairer benchmarking and targeted debiasing in real-world deployments of lightweight language systems.
Abstract
Language models (LMs) are known to be prone to response biases, which present as option preference biases in fixed-response questions. It is therefore imperative to develop low-cost and effective response bias correction methods to improve LM performance and enable more accurate evaluations of model abilities. Here, we propose a simple response bias correction strategy ($\texttt{RBCorr}$) and test it on 12 open-weight language models using yes-no, entailment, and multiple choice questions. We show that response bias is prevalent in LMs pre-correction and that $\texttt{RBCorr}$ effectively eliminates bias and boosts model performance. We also explore the generalizability of bias behavior across models, datasets, and prompt formats, showing that LogProbs-based correction is highly dependent on all three of these aspects. Overall, $\texttt{RBCorr}$ is an easy-to-use method that can boost the performance of smaller LMs and ensure that LM performance on closed-response benchmarks aligns more closely with their true capabilities.
