Table of Contents
Fetching ...

Post Hoc Regression Refinement via Pairwise Rankings

Kevin Tirta Wijaya, Michael Sun, Minghao Guo, Hans-Peter Seidel, Wojciech Matusik, Vahid Babaei

TL;DR

RankRefine addresses the challenge of accurate regression in data-scarce regimes by incorporating small sets of pairwise rankings as auxiliary signals. It fuses the base regressor output with a rank-based estimate derived from a Bradley–Terry likelihood via inverse-variance weighting, yielding a minimum-variance unbiased predictor under Gaussian assumptions. Theoretical analysis guarantees MAE reduction whenever the ranker variance is finite, and empirical results across nine molecular-property benchmarks and several tabular tasks demonstrate consistent improvements, even with ranker accuracies as low as ~0.55 and with modest reference budgets ($k \,\approx\,20$). The approach also works with off-the-shelf LLMs (e.g., ChatGPT-4o) and human raters, highlighting practical applicability in low-data settings and interactive decision-making contexts.

Abstract

Accurate prediction of continuous properties is essential to many scientific and engineering tasks. Although deep-learning regressors excel with abundant labels, their accuracy deteriorates in data-scarce regimes. We introduce RankRefine, a model-agnostic, plug-and-play post hoc method that refines regression with expert knowledge coming from pairwise rankings. Given a query item and a small reference set with known properties, RankRefine combines the base regressor's output with a rank-based estimate via inverse variance weighting, requiring no retraining. In molecular property prediction task, RankRefine achieves up to 10% relative reduction in mean absolute error using only 20 pairwise comparisons obtained through a general-purpose large language model (LLM) with no finetuning. As rankings provided by human experts or general-purpose LLMs are sufficient for improving regression across diverse domains, RankRefine offers practicality and broad applicability, especially in low-data settings.

Post Hoc Regression Refinement via Pairwise Rankings

TL;DR

RankRefine addresses the challenge of accurate regression in data-scarce regimes by incorporating small sets of pairwise rankings as auxiliary signals. It fuses the base regressor output with a rank-based estimate derived from a Bradley–Terry likelihood via inverse-variance weighting, yielding a minimum-variance unbiased predictor under Gaussian assumptions. Theoretical analysis guarantees MAE reduction whenever the ranker variance is finite, and empirical results across nine molecular-property benchmarks and several tabular tasks demonstrate consistent improvements, even with ranker accuracies as low as ~0.55 and with modest reference budgets (). The approach also works with off-the-shelf LLMs (e.g., ChatGPT-4o) and human raters, highlighting practical applicability in low-data settings and interactive decision-making contexts.

Abstract

Accurate prediction of continuous properties is essential to many scientific and engineering tasks. Although deep-learning regressors excel with abundant labels, their accuracy deteriorates in data-scarce regimes. We introduce RankRefine, a model-agnostic, plug-and-play post hoc method that refines regression with expert knowledge coming from pairwise rankings. Given a query item and a small reference set with known properties, RankRefine combines the base regressor's output with a rank-based estimate via inverse variance weighting, requiring no retraining. In molecular property prediction task, RankRefine achieves up to 10% relative reduction in mean absolute error using only 20 pairwise comparisons obtained through a general-purpose large language model (LLM) with no finetuning. As rankings provided by human experts or general-purpose LLMs are sufficient for improving regression across diverse domains, RankRefine offers practicality and broad applicability, especially in low-data settings.

Paper Structure

This paper contains 32 sections, 3 theorems, 19 equations, 8 figures, 5 tables.

Key Result

Theorem 3.1

If $\hat{y}_0^{\text{reg}}$ and $\hat{y}_0^{\text{rank}}$ are independent, unbiased Gaussian estimators of $y_0$ with variances $\sigma_{\text{reg}}^{2}$ and $\sigma_{\text{rank}}^{2}$, the minimum-variance unbiased estimator is

Figures (8)

  • Figure 1: Overview of the RankRefine framework. During training, a regressor is trained using labeled data. At inference, a query sample is paired with reference samples with known properties and compared by an external ranker to produce pairwise rankings. These rankings are used to estimate the query’s property value via a rank-based estimator. The final prediction is obtained by fusing the regressor’s output and the rank-based estimate using inverse variance weighting.
  • Figure 2: Empirical validation of the theoretical MAE reduction bound under ideal conditions. For each target improvement factor $\beta$, we compute the required ranker variance $\sigma^2_\text{rank}$ using the right-side inequality in Implication \ref{['eq:beta_bound']}. We simulate predictions from an oracle regressor ($\hat{y}_0^{\text{reg}} \sim \mathcal{N}(y_0, 1)$) and an oracle ranker ($\hat{y}_0^\text{rank} \sim \mathcal{N}(y_0, \sigma^2_\text{rank})$), and compute the fused estimate. The observed post-refinement MAE ratios match the ideal $\beta$ values, demonstrating the correctness of the fusion rule under Gaussian distribution assumptions.
  • Figure 3: Performance of RankRefine on molecular property prediction datasets under varying oracle ranker accuracy and number of reference comparisons. Each plot shows the normalized error $\beta = \frac{\text{MAE}_{\text{post}}}{\text{MAE}_{\text{reg}}}$ as a function of ranker accuracy, averaged over 5 random splits. $\beta < 1$ indicates improvements in regression performance over the base regressor. $k$ is the number of pairwise comparisons for each test molecule. Shaded regions indicate standard deviation. Dashed gray line shows baseline MAE with no refinement. Across most configurations, RankRefine improves regression performance when using a ranker with accuracy as low as 0.55. Increasing $k$ tends to lower $\beta$, but typically the benefits start to diminish beyond $k=20$.
  • Figure 4: Performance of RankRefine on tabular datasets under varying oracle ranker accuracy and number of reference comparisons. Across most configurations, RankRefine improves regression performance when using a ranker with accuracy as low as 0.55.
  • Figure 5: (a) Comparison between RankRefine the constrained optimization (projection)-based refinement method of yan2024consolidating_projection on molecular property prediction tasks. We report the difference in normalized error, $\beta_\text{ours} - \beta_\text{projection}$. A negative value indicates better performance by RankRefine. Each curve corresponds to a dataset. RankRefine generally outperforms the baseline when the ranker accuracy is between 0.5 and 0.95. (b) We also compare our method to a related post hoc regression improvement method, regression by re-ranking gonccalves2023regression_regressionbyreranking.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 3.1: RankRefine Fusion Theorem
  • Lemma 3.2: Variance of the rank-based estimate
  • Corollary 3.2.1