CALICO: Confident Active Learning with Integrated Calibration
Lorenzo S. Querol, Hajime Nagahara, Hideaki Hayashi
TL;DR
CALICO addresses the data-label bottleneck in safety-critical deep learning by calibrating confidence during active learning through the joint training of a classifier and an energy-based model to simultaneously estimate class posteriors and input distributions. The framework uses calibrated confidence in a least-confident query strategy, enabling informative sample selection without requiring extra labeled data. Empirical evaluation on five MedMNIST medical-imaging datasets shows that CALICO improves classification accuracy and reduces calibration error (ECE) compared with softmax-based baselines, particularly under limited labeling. The study also explores class-distribution balancing as a factor in calibration stability and discusses scalability as a limitation with directions for future work, including Bayesian approaches and larger-scale datasets.
Abstract
The growing use of deep learning in safety-critical applications, such as medical imaging, has raised concerns about limited labeled data, where this demand is amplified as model complexity increases, posing hurdles for domain experts to annotate data. In response to this, active learning (AL) is used to efficiently train models with limited annotation costs. In the context of deep neural networks (DNNs), AL often uses confidence or probability outputs as a score for selecting the most informative samples. However, modern DNNs exhibit unreliable confidence outputs, making calibration essential. We propose an AL framework that self-calibrates the confidence used for sample selection during the training process, referred to as Confident Active Learning with Integrated CalibratiOn (CALICO). CALICO incorporates the joint training of a classifier and an energy-based model, instead of the standard softmax-based classifier. This approach allows for simultaneous estimation of the input data distribution and the class probabilities during training, improving calibration without needing an additional labeled dataset. Experimental results showcase improved classification performance compared to a softmax-based classifier with fewer labeled samples. Furthermore, the calibration stability of the model is observed to depend on the prior class distribution of the data.
