Table of Contents
Fetching ...

Beyond the Norms: Detecting Prediction Errors in Regression Models

Andres Altieri, Marco Romanelli, Georg Pichler, Florence Alberge, Pablo Piantanida

TL;DR

The paper investigates detecting unreliable predictions in regression by formalizing unreliability through a discrepancy $d(\mathbf{Y}, f_{\mathcal{D}_n}(\mathbf{X}))$ exceeding a threshold $\epsilon$. It introduces data-driven detectors that estimate the discrepancy density and, crucially, a diversity-based score $\mathbb{H}(\mathbf{x})$ to distinguish reliable from unreliable inputs, including DV-Y and DV-D variants. The approach bridges baseline conditional-distribution methods with a robust, data-adaptive mechanism that compensates for estimation error, achieving superior AUROC on multiple UCI regression tasks and providing practical guidance for uncertainty quantification in safe ML systems. The work highlights the potential of learning distribution-aware detectors without requiring perfect probabilistic models and outlines connections to conformal ideas while emphasizing conditional reliability assessments. Overall, the proposed framework advances reliable regression by combining discrepancy-based definitions with diversity-driven detection to improve safety-critical decision-making.

Abstract

This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems. Our code is available at https://zenodo.org/records/11281964.

Beyond the Norms: Detecting Prediction Errors in Regression Models

TL;DR

The paper investigates detecting unreliable predictions in regression by formalizing unreliability through a discrepancy exceeding a threshold . It introduces data-driven detectors that estimate the discrepancy density and, crucially, a diversity-based score to distinguish reliable from unreliable inputs, including DV-Y and DV-D variants. The approach bridges baseline conditional-distribution methods with a robust, data-adaptive mechanism that compensates for estimation error, achieving superior AUROC on multiple UCI regression tasks and providing practical guidance for uncertainty quantification in safe ML systems. The work highlights the potential of learning distribution-aware detectors without requiring perfect probabilistic models and outlines connections to conformal ideas while emphasizing conditional reliability assessments. Overall, the proposed framework advances reliable regression by combining discrepancy-based definitions with diversity-driven detection to improve safety-critical decision-making.

Abstract

This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems. Our code is available at https://zenodo.org/records/11281964.
Paper Structure (28 sections, 2 theorems, 20 equations, 4 figures, 20 tables, 4 algorithms)

This paper contains 28 sections, 2 theorems, 20 equations, 4 figures, 20 tables, 4 algorithms.

Key Result

Proposition 4.2

The discriminator $\delta_p(\mathbf{x}, \gamma,\epsilon)$, defined as $\delta_p(\mathbf{x}, \gamma,\epsilon)=\mathds{1}\{\mathbf{x} \in \mathcal{R}_{\epsilon}(\gamma)\}$ is the most powerful statistical test, testing $\epsilon$-goodness against the alternative that $\mathbf{x}$ is $\epsilon$-bad, at

Figures (4)

  • Figure 1: An optimal detector for regression errors effectively partitions the regressor input space into two sets: one where the regressor is considered reliable, and another where it is considered unreliable. Surprisingly, the joint error distributions in each set exhibit distinct (diversity) behaviors, enabling the identification of points that lead to effective detectors in real applications. The visual representation above depicts the theoretical distributions in both sets, illustrating clear differences between the two classes. The corresponding approximate distributions learned from the data, shown below, exhibit the same distinctive behavior. These plots correspond to the example in \ref{['sec:example']}.
  • Figure 2: Optimal rejection region for the example in Section \ref{['sec:example']}.
  • Figure 3: Examples of average joint error distributions given by \ref{['eq:pE1E2']} for the absolute error discrepancy function $|Y-f_{\mathcal{D}_n}(x)|$ using the regressors trained in Section \ref{['sec:numerical']}. The patterns are obtained from the corresponding test set and the distributions of the discrepancy variables $D(Y, f_{\mathcal{D}_n}(x))$ are used.
  • Figure 4: Examples of average joint error distributions given by \ref{['eq:pE1E2']} for the relative error discrepancy function $|Y-f_{\mathcal{D}_n}|/|f_{\mathcal{D}_n}|$ using the regressors trained in Section \ref{['sec:numerical']}. The patterns are obtained from the corresponding test set and the distributions of the discrepancy variables $D(Y, f_{\mathcal{D}_n})$ are used.

Theorems & Definitions (8)

  • Definition 3.1: Discrepancy function
  • Definition 3.2: $\epsilon$-goodness
  • Definition 4.1: Most powerful discriminator
  • Proposition 4.2
  • Definition 5.1: Diversity coefficients
  • Definition 5.2: Diversity metric for regressor
  • Definition 5.3: Diversity discriminator
  • Proposition 5.4