Table of Contents
Fetching ...

CAIRO: Decoupling Order from Scale in Regression

Harri Vanhems, Yue Zhao, Peng Shi, Archer Y. Yang

TL;DR

This work proposes CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages, and theoretically characterize a class of "Optimal-in-Rank-Order"objectives and proves that they recover the ordering of the true conditional mean under mild assumptions.

Abstract

Standard regression methods typically optimize a single pointwise objective, such as mean squared error, which conflates the learning of ordering with the learning of scale. This coupling renders models vulnerable to outliers and heavy-tailed noise. We propose CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages. In the first stage, we learn a scoring function by minimizing a scale-invariant ranking loss; in the second, we recover the target scale via isotonic regression. We theoretically characterize a class of "Optimal-in-Rank-Order" objectives -- including variants of RankNet and Gini covariance -- and prove that they recover the ordering of the true conditional mean under mild assumptions. We further show that subsequent monotone calibration guarantees recovery of the true regression function. Empirically, CAIRO combines the representation learning of neural networks with the robustness of rank-based statistics. It matches the performance of state-of-the-art tree ensembles on tabular benchmarks and significantly outperforms standard regression objectives in regimes with heavy-tailed or heteroskedastic noise.

CAIRO: Decoupling Order from Scale in Regression

TL;DR

This work proposes CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages, and theoretically characterize a class of "Optimal-in-Rank-Order"objectives and proves that they recover the ordering of the true conditional mean under mild assumptions.

Abstract

Standard regression methods typically optimize a single pointwise objective, such as mean squared error, which conflates the learning of ordering with the learning of scale. This coupling renders models vulnerable to outliers and heavy-tailed noise. We propose CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages. In the first stage, we learn a scoring function by minimizing a scale-invariant ranking loss; in the second, we recover the target scale via isotonic regression. We theoretically characterize a class of "Optimal-in-Rank-Order" objectives -- including variants of RankNet and Gini covariance -- and prove that they recover the ordering of the true conditional mean under mild assumptions. We further show that subsequent monotone calibration guarantees recovery of the true regression function. Empirically, CAIRO combines the representation learning of neural networks with the robustness of rank-based statistics. It matches the performance of state-of-the-art tree ensembles on tabular benchmarks and significantly outperforms standard regression objectives in regimes with heavy-tailed or heteroskedastic noise.
Paper Structure (33 sections, 7 theorems, 60 equations, 1 figure, 3 tables)

This paper contains 33 sections, 7 theorems, 60 equations, 1 figure, 3 tables.

Key Result

Theorem 1

Let $\mathcal{L}$ be the loss defined in weighted-01-pop. Then we have the following equivalent representations $\mathcal{L}_{\textup{uni}}$, $\mathcal{L}_{\textup{abs}}$, and $\mathcal{L}_{\textup{cdf}}$ based on three different choices of the weight $w(Y,Y')$:

Figures (1)

  • Figure 1: CAIRO-RankNet across Data Regimes. Top Row (Normal Regime): Stage 1 (a) learns a linear ranking, and Stage 2 (b) learns a linear calibration map, resulting in a standard regression fit (c). Bottom Row (Heteroskedastic and Heavy Tail Regime): In the presence of heavy-tailed noise, Stage 1 (d) learns a non-linear ranking to preserve order. Stage 2 (e) learns a non-parametric step function to correct this warp. The final model (f) successfully recovers the underlying signal (blue line) despite the large variance in the raw data (gray cloud).

Theorems & Definitions (18)

  • Theorem 1
  • Definition 1
  • Theorem 2
  • Remark
  • Theorem 3
  • Remark
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 8 more