Table of Contents
Fetching ...

CALICO: Confident Active Learning with Integrated Calibration

Lorenzo S. Querol, Hajime Nagahara, Hideaki Hayashi

TL;DR

CALICO addresses the data-label bottleneck in safety-critical deep learning by calibrating confidence during active learning through the joint training of a classifier and an energy-based model to simultaneously estimate class posteriors and input distributions. The framework uses calibrated confidence in a least-confident query strategy, enabling informative sample selection without requiring extra labeled data. Empirical evaluation on five MedMNIST medical-imaging datasets shows that CALICO improves classification accuracy and reduces calibration error (ECE) compared with softmax-based baselines, particularly under limited labeling. The study also explores class-distribution balancing as a factor in calibration stability and discusses scalability as a limitation with directions for future work, including Bayesian approaches and larger-scale datasets.

Abstract

The growing use of deep learning in safety-critical applications, such as medical imaging, has raised concerns about limited labeled data, where this demand is amplified as model complexity increases, posing hurdles for domain experts to annotate data. In response to this, active learning (AL) is used to efficiently train models with limited annotation costs. In the context of deep neural networks (DNNs), AL often uses confidence or probability outputs as a score for selecting the most informative samples. However, modern DNNs exhibit unreliable confidence outputs, making calibration essential. We propose an AL framework that self-calibrates the confidence used for sample selection during the training process, referred to as Confident Active Learning with Integrated CalibratiOn (CALICO). CALICO incorporates the joint training of a classifier and an energy-based model, instead of the standard softmax-based classifier. This approach allows for simultaneous estimation of the input data distribution and the class probabilities during training, improving calibration without needing an additional labeled dataset. Experimental results showcase improved classification performance compared to a softmax-based classifier with fewer labeled samples. Furthermore, the calibration stability of the model is observed to depend on the prior class distribution of the data.

CALICO: Confident Active Learning with Integrated Calibration

TL;DR

CALICO addresses the data-label bottleneck in safety-critical deep learning by calibrating confidence during active learning through the joint training of a classifier and an energy-based model to simultaneously estimate class posteriors and input distributions. The framework uses calibrated confidence in a least-confident query strategy, enabling informative sample selection without requiring extra labeled data. Empirical evaluation on five MedMNIST medical-imaging datasets shows that CALICO improves classification accuracy and reduces calibration error (ECE) compared with softmax-based baselines, particularly under limited labeling. The study also explores class-distribution balancing as a factor in calibration stability and discusses scalability as a limitation with directions for future work, including Bayesian approaches and larger-scale datasets.

Abstract

The growing use of deep learning in safety-critical applications, such as medical imaging, has raised concerns about limited labeled data, where this demand is amplified as model complexity increases, posing hurdles for domain experts to annotate data. In response to this, active learning (AL) is used to efficiently train models with limited annotation costs. In the context of deep neural networks (DNNs), AL often uses confidence or probability outputs as a score for selecting the most informative samples. However, modern DNNs exhibit unreliable confidence outputs, making calibration essential. We propose an AL framework that self-calibrates the confidence used for sample selection during the training process, referred to as Confident Active Learning with Integrated CalibratiOn (CALICO). CALICO incorporates the joint training of a classifier and an energy-based model, instead of the standard softmax-based classifier. This approach allows for simultaneous estimation of the input data distribution and the class probabilities during training, improving calibration without needing an additional labeled dataset. Experimental results showcase improved classification performance compared to a softmax-based classifier with fewer labeled samples. Furthermore, the calibration stability of the model is observed to depend on the prior class distribution of the data.
Paper Structure (19 sections, 5 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 5 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Joint learning of a classifier and generative model and its advantage. (a) Training solely a classifier can lead to inaccurately high posterior probability near the class decision boundary. (b) Estimating the input distribution using a generative model allows us to account for the frequency of data occurrences. (c) Joint learning of the classifier and generative model helps to calibrate the confidence scores.
  • Figure 2: An overview of the typical active learning (AL) cycle.
  • Figure 3: The final reliability diagrams for the baseline, active, and CALICO. In comparison, CALICO demonstrated a substantial improvement in calibration across confidence intervals when compared to the other evaluated methods that used softmax-based classifiers.
  • Figure 4: The test accuracy (top) and ECE (bottom) values plotted against the number of labeled samples per AL iteration. It can be observed that a lower or comparable ECE value can be achieved with a lesser number of labeled samples.
  • Figure 5: The test accuracy (top) and ECE (bottom) values of CALICO with an equal class distribution. Note that the Equal method was designed to strictly enforce an equal distribution by setting a limit based on the dataset's class label with the fewest samples. Additionally, the number of labels per class varied per dataset to allow for an adequate number of iterations to properly analyze the learning curve.