Table of Contents
Fetching ...

Calibrated Uncertainty Sampling for Active Learning

Ha Manh Bui, Iliana Maifeld-Carucci, Anqi Liu

TL;DR

The paper addresses calibration gaps in uncertainty-based pool-based active learning by introducing Calibrated Uncertainty Sampling for AL (CUSAL), which first estimates per-sample calibration error on the unlabeled pool under covariate shift using a kernel Dirichlet estimator and then selects samples in a lexicographic order that prioritizes reducing calibration error before pursuing uncertainty. The authors establish a pointwise-consistency bound for the calibration estimator and derive bounds on both unlabeled-pool and unseen-data calibration errors, showing improved reliability as more labeled and unlabeled data accrue. Empirically, CUSAL consistently yields lower Expected Calibration Error and higher accuracy across MNIST, Fashion-MNIST, SVHN, CIFAR-10, CIFAR-10-LT, and ImageNet, with ablations underscoring the value of the lexicographic strategy and showing potential gains from hybrid diversity-uncertainty extensions. The work advances trustworthy active learning by enabling better uncertainty quantification without requiring hold-out recalibration, with practical impact for safety-critical deployments and scalable learning scenarios.

Abstract

We study the problem of actively learning a classifier with a low calibration error. One of the most popular Acquisition Functions (AFs) in pool-based Active Learning (AL) is querying by the model's uncertainty. However, we recognize that an uncalibrated uncertainty model on the unlabeled pool may significantly affect the AF effectiveness, leading to sub-optimal generalization and high calibration error on unseen data. Deep Neural Networks (DNNs) make it even worse as the model uncertainty from DNN is usually uncalibrated. Therefore, we propose a new AF by estimating calibration errors and query samples with the highest calibration error before leveraging DNN uncertainty. Specifically, we utilize a kernel calibration error estimator under the covariate shift and formally show that AL with this AF eventually leads to a bounded calibration error on the unlabeled pool and unseen test data. Empirically, our proposed method surpasses other AF baselines by having a lower calibration and generalization error across pool-based AL settings.

Calibrated Uncertainty Sampling for Active Learning

TL;DR

The paper addresses calibration gaps in uncertainty-based pool-based active learning by introducing Calibrated Uncertainty Sampling for AL (CUSAL), which first estimates per-sample calibration error on the unlabeled pool under covariate shift using a kernel Dirichlet estimator and then selects samples in a lexicographic order that prioritizes reducing calibration error before pursuing uncertainty. The authors establish a pointwise-consistency bound for the calibration estimator and derive bounds on both unlabeled-pool and unseen-data calibration errors, showing improved reliability as more labeled and unlabeled data accrue. Empirically, CUSAL consistently yields lower Expected Calibration Error and higher accuracy across MNIST, Fashion-MNIST, SVHN, CIFAR-10, CIFAR-10-LT, and ImageNet, with ablations underscoring the value of the lexicographic strategy and showing potential gains from hybrid diversity-uncertainty extensions. The work advances trustworthy active learning by enabling better uncertainty quantification without requiring hold-out recalibration, with practical impact for safety-critical deployments and scalable learning scenarios.

Abstract

We study the problem of actively learning a classifier with a low calibration error. One of the most popular Acquisition Functions (AFs) in pool-based Active Learning (AL) is querying by the model's uncertainty. However, we recognize that an uncalibrated uncertainty model on the unlabeled pool may significantly affect the AF effectiveness, leading to sub-optimal generalization and high calibration error on unseen data. Deep Neural Networks (DNNs) make it even worse as the model uncertainty from DNN is usually uncalibrated. Therefore, we propose a new AF by estimating calibration errors and query samples with the highest calibration error before leveraging DNN uncertainty. Specifically, we utilize a kernel calibration error estimator under the covariate shift and formally show that AL with this AF eventually leads to a bounded calibration error on the unlabeled pool and unseen test data. Empirically, our proposed method surpasses other AF baselines by having a lower calibration and generalization error across pool-based AL settings.

Paper Structure

This paper contains 26 sections, 2 theorems, 39 equations, 17 figures, 5 tables, 1 algorithm.

Key Result

Theorem 4.1

Given a sample $x$ on the unlabeled pool with $m_t$ data points and $n_t$ samples in the cumulative labeled data, our estimator in Eq. eq:CE is a point-wise consistent estimator under active learning with covariate shift, i.e., where $\pi_U$ denotes the sample distribution on the unlabeled pool. And, the mean square error of our estimator in Eq. eq:acc_estimator is bounded by

Figures (17)

  • Figure 1: Accuracy and calibration comparison on MNIST with $T=50$ and $k=10$. Uncalibrated-least-conf is the least-conf method, but is additionally made to be uncalibrated by randomly scaling the logit vectors for every sample. We can see that when the model is uncalibrated, the least-confident sampling not only has a worse ECE than our method but also has a lower accuracy because of querying non-informative samples. A short demo is available at https://colab.research.google.com/drive/1QRLmzHyET-heDZ4IUuhU1v1Zd2gIrIXF?usp=sharing.
  • Figure 2: Overview of our calibrated uncertainty sampling framework for Active Learning.
  • Figure 3: ECE estimation quality between known labels in Eq. \ref{['eq:acc_estimator']} and unknown labels (ours) in Eq. \ref{['eq:CE']} on MNIST in Sec. \ref{['subsec:ablation']}.
  • Figure 4: 2-D visualizations of AL performance regarding ECE (x-axis) and Accuracy (y-axis) on CIFAR-10 from Tab. \ref{['tab:main_tab']}. Our method is closest to the Best performance point (i.e., $100\%$ accuracy and $0.0$ ECE).
  • Figure 5: Calibration quality on the unlabeled pool with MNIST. More results are in Tab. \ref{['tab:main_tab_uece']} and Fig. \ref{['fig:main_fig_uece']}.
  • ...and 12 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Theorem 4.1
  • Theorem 4.2
  • proof
  • proof