Quantifying Local Model Validity using Active Learning

Sven Lämmle; Can Bogoclu; Robert Voßhall; Anselm Haselhoff; Dirk Roos

Quantifying Local Model Validity using Active Learning

Sven Lämmle, Can Bogoclu, Robert Voßhall, Anselm Haselhoff, Dirk Roos

TL;DR

This paper tackles the challenge of validating ML predictions in a local, input-specific manner rather than relying on global accuracy. It formulates model validity as a two-boundary limit-state problem and uses a transformed Gaussian process to learn the boundary near the tolerance $\xi$, guided by a novel acquisition function synthetic to misclassification probability (MC-Prob). The authors provide frequentist error bounds, demonstrate data-efficient learning on analytical and real-model benchmarks, and compare against conformal prediction methods, showing substantial reductions in validation data needs while preserving safety-critical guarantees. The work offers practical impact for deploying reliable ML in regulated or safety-critical domains by enabling targeted, efficient validation of local performance.

Abstract

Real-world applications of machine learning models are often subject to legal or policy-based regulations. Some of these regulations require ensuring the validity of the model, i.e., the approximation error being smaller than a threshold. A global metric is generally too insensitive to determine the validity of a specific prediction, whereas evaluating local validity is costly since it requires gathering additional data.We propose learning the model error to acquire a local validity estimate while reducing the amount of required data through active learning. Using model validation benchmarks, we provide empirical evidence that the proposed method can lead to an error model with sufficient discriminative properties using a relatively small amount of data. Furthermore, an increased sensitivity to local changes of the validity bounds compared to alternative approaches is demonstrated.

Quantifying Local Model Validity using Active Learning

TL;DR

, guided by a novel acquisition function synthetic to misclassification probability (MC-Prob). The authors provide frequentist error bounds, demonstrate data-efficient learning on analytical and real-model benchmarks, and compare against conformal prediction methods, showing substantial reductions in validation data needs while preserving safety-critical guarantees. The work offers practical impact for deploying reliable ML in regulated or safety-critical domains by enabling targeted, efficient validation of local performance.

Abstract

Paper Structure (68 sections, 1 theorem, 31 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 68 sections, 1 theorem, 31 equations, 14 figures, 8 tables, 1 algorithm.

Introduction
Contributions.
Related Work
Reliability Analysis.
Level Sets.
Bayesian Calibration.
Conformal Prediction.
Background
Observations.
Validation Metric.
Differences to ra.
Method
Definition of Local Validity and Limit State
Overview
Notation.
...and 53 more sections

Key Result

Theorem 1

Assume that $\delta$ is a Lipschitz continuous sample from the zero mean Gaussian process with covariance kernel $k$ with Lipschitz constant $L_k$ on the compact set $\tilde{\mathbb{X}}$. Denote the Lipschitz constant of $\delta$ by $L_\delta$. Then, $\mu_{y\vert\mathcal{D}}(\cdot)$ and $\sigma_{y\v where $k^* := \max_{\mathbf{x}, \mathbf{x}'\in\mathbb{X}}k(\mathbf{x}, \mathbf{x}')$. Moreover, pic

Figures (14)

Figure 1: Illustration of a locally valid model: the trained model $f_{\mathrm{M}}$ is marginally valid for tolerance level $\xi$ with $80\%$ probability (b)), but only locally valid $\mathcal{V}$ () in some regions of the input space $\mathbb{X}$ (a)). b) Marginal distribution of the true absolute error $\vert\delta\vert$, where the $80\%$ quantile corresponds to the tolerance level, i.e., $\xi=q_{80}$. c) Our learned error model $\vert\tilde{f}_{\mathrm{D}}\vert$ and 90% confidence interval () from the folded Gaussian (Section \ref{['sec:gp']}), together with the predicted local valid set $\tilde{\mathcal{V}}_{0.1}$ () (Section \ref{['sec:prediction']}). Samples () are sequentially placed based on $\psi_{\mathrm{mis}}$ (d)) to reduce the misclassification probability (Section \ref{['sec:aq']}), i.e., most samples are close to the limit state $x_{\mathcal{S}_i}\in\mathcal{S}$.
Figure 2: Prediction $\tilde{\mathcal{V}}$ (Equation \ref{['eq:pred_valid']}) for the modified Rastrigin function after 20 initial and 70 adaptive observations, with $\psi_{\mathrm{mis}}$ and $\omega=0.2\xi$. The true limit state is represented by the black line.
Figure 3: Median and $95$% confidence intervals of $F1$-score on the analytical problem functions across 30 runs. Top: Styblinsky-Tang for 2 to 8 dimensions. Bottom: Modified Rastrigin (2-d), series system function (2-d), and Michalewicz function (4-d, 6-d).
Figure 4: gp prediction of the limit state function $g$ is a folded Gaussian distribution, which is flipped and shifted by the predefined tolerance $\xi$. The filled area shows the misclassification probability $\psi_{\mathrm{mis}}$.
Figure 5: Error bound $\eta$ for $\vert\delta-\mu_{y\vert\mathcal{D}}\vert$ as well as true error and $90\%$ confidence interval of the GP model for 10 initial samples, 20 (top left), 50 (top right), 100 (lower left) and 500 (lower right) adaptive samples. The vertical lines show the limit states. Initial and adaptive samples are blue and orange, resp.
...and 9 more figures

Theorems & Definitions (3)

Definition 1: Local Validity
Definition 2: Limit State
Theorem 1

Quantifying Local Model Validity using Active Learning

TL;DR

Abstract

Quantifying Local Model Validity using Active Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (3)