Table of Contents
Fetching ...

When No-Rejection Learning is Consistent for Regression with Rejection

Xiaocheng Li, Shang Liu, Chunlin Sun, Hanzhao Wang

TL;DR

This work studies regression with rejection (RwR), where a regressor $f$ and a rejector $r$ deflect difficult cases to humans at cost $c$ via the loss $L_{RwR}=\mathbb{E}[r(X)l(f(X),Y)+(1-r(X))c]$. It shows consistency of no-rejection learning under a weak realizability condition and develops a truncated loss $\tilde L$ with a squared-loss surrogate $L_2$ to enable analysis beyond realizability, yielding a generalization bound $L_{RwR}(\hat f, r_{\hat R})-L_{RwR}^*\le E[(\hat f-f^*)^2]+E[|\hat R(\hat f,X)-R(\hat f,X)|]$. A simple calibrator-based approach learns the rejector using an independent validation set, with fixed-cost and fixed-budget variants provided through score-based thresholds and split-conformal ideas. Theoretical results are complemented by numerical experiments on eight UCI datasets, demonstrating the practical effectiveness of no-rejection learning and the proposed calibrators in achieving favorable RwR performance under both cost and budget constraints.

Abstract

Learning with rejection has been a prototypical model for studying the human-AI interaction on prediction tasks. Upon the arrival of a sample instance, the model first uses a rejector to decide whether to accept and use the AI predictor to make a prediction or reject and defer the sample to humans. Learning such a model changes the structure of the original loss function and often results in undesirable non-convexity and inconsistency issues. For the classification with rejection problem, several works develop consistent surrogate losses for the joint learning of the predictor and the rejector, while there have been fewer works for the regression counterpart. This paper studies the regression with rejection (RwR) problem and investigates a no-rejection learning strategy that uses all the data to learn the predictor. We first establish the consistency for such a strategy under the weak realizability condition. Then for the case without the weak realizability, we show that the excessive risk can also be upper bounded with the sum of two parts: prediction error and calibration error. Lastly, we demonstrate the advantage of such a proposed learning strategy with empirical evidence.

When No-Rejection Learning is Consistent for Regression with Rejection

TL;DR

This work studies regression with rejection (RwR), where a regressor and a rejector deflect difficult cases to humans at cost via the loss . It shows consistency of no-rejection learning under a weak realizability condition and develops a truncated loss with a squared-loss surrogate to enable analysis beyond realizability, yielding a generalization bound . A simple calibrator-based approach learns the rejector using an independent validation set, with fixed-cost and fixed-budget variants provided through score-based thresholds and split-conformal ideas. Theoretical results are complemented by numerical experiments on eight UCI datasets, demonstrating the practical effectiveness of no-rejection learning and the proposed calibrators in achieving favorable RwR performance under both cost and budget constraints.

Abstract

Learning with rejection has been a prototypical model for studying the human-AI interaction on prediction tasks. Upon the arrival of a sample instance, the model first uses a rejector to decide whether to accept and use the AI predictor to make a prediction or reject and defer the sample to humans. Learning such a model changes the structure of the original loss function and often results in undesirable non-convexity and inconsistency issues. For the classification with rejection problem, several works develop consistent surrogate losses for the joint learning of the predictor and the rejector, while there have been fewer works for the regression counterpart. This paper studies the regression with rejection (RwR) problem and investigates a no-rejection learning strategy that uses all the data to learn the predictor. We first establish the consistency for such a strategy under the weak realizability condition. Then for the case without the weak realizability, we show that the excessive risk can also be upper bounded with the sum of two parts: prediction error and calibration error. Lastly, we demonstrate the advantage of such a proposed learning strategy with empirical evidence.
Paper Structure (33 sections, 11 theorems, 46 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 11 theorems, 46 equations, 2 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Suppose $\mathcal{F}$ contains all measurable functions that map from $\mathcal{X}$ to $\mathcal{Y}$, and $\mathcal{G}$ contains all measurable functions that map from $\mathcal{X}$ to $\{0,1\}$. Then the optimal regressor $f^*(X)$ and rejector $r^*(X)$ for eqdef:loss are

Figures (2)

  • Figure 1: An illustration adapted from bansal2021most on classification with rejection. (a) plots the linear SVM model that gives optimal classification accuracy and (b) plots the optimal linear SVM model that optimizes the loss of classification with rejection. That is, (a) ignores the rejection structure and treats the problem as a standard classification problem, while (b) is optimized for the classification with rejection objective. As a consequence, (a) achieves an accuracy of $24/35$, but it needs to reject $20$ points to achieve $34/35$ overall accuracy (assuming the human classifier always correctly predicts). (b) achieves an accuracy of $21/35$, but it only needs to reject $15$ points to achieve $34/35$ overall accuracy. bansal2021most uses the paradox to emphasize the importance of accounting for the rejection structure when learning the classifier. In (c), we explain this paradox by the richness of the classifier function class. Specifically, we plot an SVM classifier with a Gaussian radial basis function (RBF) kernel that is learned as the standard classification problem; and this classifier is provably optimal for any measurable rejector (in the sense of Proposition \ref{['prop:weak_real']}). The contrast between (a) and (b) is a result of the limitation of the classifier class. For the linear SVM model in this example, when it performs well on some region of the data, it will sacrifice the other region. When the function class becomes richer to (in practice, approximately) cover the Bayes optimal classifier, this phenomenon will not exist anymore.
  • Figure 2: RwR loss with fixed cost $c=2$ (red dashed line).

Theorems & Definitions (25)

  • Proposition 1: zaoui2020regression
  • Definition 1: Optimality of joint learning
  • Proposition 2
  • Definition 2: Weak realizability
  • Proposition 3
  • Proposition 4
  • Example 1
  • Proposition 5
  • Proposition 6
  • Theorem 1
  • ...and 15 more