When No-Rejection Learning is Consistent for Regression with Rejection
Xiaocheng Li, Shang Liu, Chunlin Sun, Hanzhao Wang
TL;DR
This work studies regression with rejection (RwR), where a regressor $f$ and a rejector $r$ deflect difficult cases to humans at cost $c$ via the loss $L_{RwR}=\mathbb{E}[r(X)l(f(X),Y)+(1-r(X))c]$. It shows consistency of no-rejection learning under a weak realizability condition and develops a truncated loss $\tilde L$ with a squared-loss surrogate $L_2$ to enable analysis beyond realizability, yielding a generalization bound $L_{RwR}(\hat f, r_{\hat R})-L_{RwR}^*\le E[(\hat f-f^*)^2]+E[|\hat R(\hat f,X)-R(\hat f,X)|]$. A simple calibrator-based approach learns the rejector using an independent validation set, with fixed-cost and fixed-budget variants provided through score-based thresholds and split-conformal ideas. Theoretical results are complemented by numerical experiments on eight UCI datasets, demonstrating the practical effectiveness of no-rejection learning and the proposed calibrators in achieving favorable RwR performance under both cost and budget constraints.
Abstract
Learning with rejection has been a prototypical model for studying the human-AI interaction on prediction tasks. Upon the arrival of a sample instance, the model first uses a rejector to decide whether to accept and use the AI predictor to make a prediction or reject and defer the sample to humans. Learning such a model changes the structure of the original loss function and often results in undesirable non-convexity and inconsistency issues. For the classification with rejection problem, several works develop consistent surrogate losses for the joint learning of the predictor and the rejector, while there have been fewer works for the regression counterpart. This paper studies the regression with rejection (RwR) problem and investigates a no-rejection learning strategy that uses all the data to learn the predictor. We first establish the consistency for such a strategy under the weak realizability condition. Then for the case without the weak realizability, we show that the excessive risk can also be upper bounded with the sum of two parts: prediction error and calibration error. Lastly, we demonstrate the advantage of such a proposed learning strategy with empirical evidence.
