Table of Contents
Fetching ...

Understanding Uncertainty-based Active Learning Under Model Mismatch

Amir Hossein Rahmati, Mingzhou Fan, Ruida Zhou, Nathan M. Urban, Byung-Jun Yoon, Xiaoning Qian

TL;DR

The paper addresses how the efficacy of Uncertainty-based Active Learning (UAL) in regression depends on model capacity. It uses Bayesian learning, BAL, and a bias-variance framework, supported by Bernstein-von Mises theory, to show that variance-based acquisition can reflect true $MSE$ only when the model class can cover the ground-truth function; otherwise, UAL can underperform random sampling. To mitigate model-mismatch drawbacks, it proposes remedies that directly estimate the true objective $MSE$ or bound it, demonstrated through synthetic experiments with Bayesian Polynomial Regression and Gaussian Process Regression as well as real datasets. The findings provide practical guidance for designing robust UAL strategies when the predictive model class is insufficient to capture the underlying target, highlighting the value of objective-aligned acquisition functions. This work lays groundwork for error-aware acquisition design and suggests Kriging-like estimators and $MSE$ upper-bound based approaches as promising directions.

Abstract

Instead of randomly acquiring training data points, Uncertainty-based Active Learning (UAL) operates by querying the label(s) of pivotal samples from an unlabeled pool selected based on the prediction uncertainty, thereby aiming at minimizing the labeling cost for model training. The efficacy of UAL critically depends on the model capacity as well as the adopted uncertainty-based acquisition function. Within the context of this study, our analytical focus is directed toward comprehending how the capacity of the machine learning model may affect UAL efficacy. Through theoretical analysis, comprehensive simulations, and empirical studies, we conclusively demonstrate that UAL can lead to worse performance in comparison with random sampling when the machine learning model class has low capacity and is unable to cover the underlying ground truth. In such situations, adopting acquisition functions that directly target estimating the prediction performance may be beneficial for improving the performance of UAL.

Understanding Uncertainty-based Active Learning Under Model Mismatch

TL;DR

The paper addresses how the efficacy of Uncertainty-based Active Learning (UAL) in regression depends on model capacity. It uses Bayesian learning, BAL, and a bias-variance framework, supported by Bernstein-von Mises theory, to show that variance-based acquisition can reflect true only when the model class can cover the ground-truth function; otherwise, UAL can underperform random sampling. To mitigate model-mismatch drawbacks, it proposes remedies that directly estimate the true objective or bound it, demonstrated through synthetic experiments with Bayesian Polynomial Regression and Gaussian Process Regression as well as real datasets. The findings provide practical guidance for designing robust UAL strategies when the predictive model class is insufficient to capture the underlying target, highlighting the value of objective-aligned acquisition functions. This work lays groundwork for error-aware acquisition design and suggests Kriging-like estimators and upper-bound based approaches as promising directions.

Abstract

Instead of randomly acquiring training data points, Uncertainty-based Active Learning (UAL) operates by querying the label(s) of pivotal samples from an unlabeled pool selected based on the prediction uncertainty, thereby aiming at minimizing the labeling cost for model training. The efficacy of UAL critically depends on the model capacity as well as the adopted uncertainty-based acquisition function. Within the context of this study, our analytical focus is directed toward comprehending how the capacity of the machine learning model may affect UAL efficacy. Through theoretical analysis, comprehensive simulations, and empirical studies, we conclusively demonstrate that UAL can lead to worse performance in comparison with random sampling when the machine learning model class has low capacity and is unable to cover the underlying ground truth. In such situations, adopting acquisition functions that directly target estimating the prediction performance may be beneficial for improving the performance of UAL.
Paper Structure (14 sections, 3 theorems, 27 equations, 12 figures)

This paper contains 14 sections, 3 theorems, 27 equations, 12 figures.

Key Result

Theorem 3.1

Let $\mathbf{D_L} = \{(x_i,y_i)\}_{i=1}^{n_L} \overset{\mathrm{iid}}{\sim} P_{\theta_{*}}$, for $\theta_*, \theta \in \Theta \subset \mathbb{R}^p$. Let $\ell_n = \log P(\mathbf{D_L} \mid \theta)$ and define $\zeta = \Sigma_n^{-\frac{1}{2}} (\theta - \hat{\theta}_n)$ where $\hat{\theta}_n = \mathop{ where $\tau(\zeta) = (2\pi)^{-\frac{p}{2}} \exp(-\frac{1}{2} \|\zeta\|^2)$.

Figures (12)

  • Figure 1: Motivating example, where the ground truth is more complex than what can be captured by the prediction model class. In this example, the noisy data follows the ground truth function that is the summation of a quadratic function, and a cosine function ( $y=f(x)+\epsilon$ with $\epsilon \sim N(0,1)$, where $f(x) = \langle\mathbf{x},\mathbf{w}\rangle + cos(2\pi x)$, and $\mathbf{x} = [1,x,x^2]$). The left plot shows the ground truth target function and its corresponding noisy observed data. The right plot shows the performance comparison between UAL and random sampling, to learn the quadratic predictor based on Bayesian polynomial regression.
  • Figure 2: Schematic illustration of the UAL procedure.
  • Figure 3: Motivating example with a ground truth target that is as complex as the prediction model class. In this example, the noisy data follows the ground truth function which is a quadratic function ($y = f(x)+\epsilon$ with $\epsilon \sim N(0,1)$, where $f(x)=\langle\mathbf{x},\mathbf{w}\rangle$, and $\mathbf{x} = [1,x,x^2]$). The left plot shows the ground truth target function with the same $\mathbf{w}$ as the example in Section \ref{['Introduction']} and its corresponding noisy observed data. The right plot shows the comparison of UAL and random sampling performance to learn the quadratic predictor based on Bayesian polynomial regression.
  • Figure 4: Bias-variance decomposition for the motivating example with lower complexity prediction model in Section \ref{['Introduction']}. In this example with the incapable prediction model compared to the target function, the estimated variance cannot capture the learning objective, MSE.
  • Figure 5: Performance of model class of polynomial order one to five and their corresponding bias-variance decomposition on BP regression task over the $3^{rd}$-order polynomial family. The first row shows the performance and the second row shows the bias-variance decomposition related to each model. As the prediction model becomes more complicated, the ability of variance to capture the learning objective (MSE) increases; hence, UAL performance improves.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Theorem 3.1: Bernstein-von Mises schervish2012theory
  • Proposition 3.2
  • Proposition 1
  • proof