Table of Contents
Fetching ...

Deep Active Learning with Noise Stability

Xingjian Li, Pengkun Yang, Yangcheng Gu, Xueying Zhan, Tianyang Wang, Min Xu, Chengzhong Xu

TL;DR

The paper tackles uncertainty estimation in deep active learning under label scarcity by introducing NoiseStability, a simple, model-internal criterion that measures output deviations when model parameters are perturbed by small noise. It shows that the expected deviation magnitude aligns with the Jacobian norm and that the collected perturbation directions act as randomly projected gradients, preserving key geometric relationships and enabling diverse batch selection via a k-center approach. The method requires no auxiliary models or data-dependent designs and demonstrates competitive or superior performance across image classification, regression, semantic segmentation, and NLP tasks, with supporting theory on gradient equivalence, diversity, and variance reduction. The approach is practical, scalable, and broadly applicable, offering a robust alternative to existing gradient-based and Bayesian Active Learning methods.

Abstract

Uncertainty estimation for unlabeled data is crucial to active learning. With a deep neural network employed as the backbone model, the data selection process is highly challenging due to the potential over-confidence of the model inference. Existing methods resort to special learning fashions (e.g. adversarial) or auxiliary models to address this challenge. This tends to result in complex and inefficient pipelines, which would render the methods impractical. In this work, we propose a novel algorithm that leverages noise stability to estimate data uncertainty. The key idea is to measure the output derivation from the original observation when the model parameters are randomly perturbed by noise. We provide theoretical analyses by leveraging the small Gaussian noise theory and demonstrate that our method favors a subset with large and diverse gradients. Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis. It achieves competitive performance compared against state-of-the-art active learning baselines.

Deep Active Learning with Noise Stability

TL;DR

The paper tackles uncertainty estimation in deep active learning under label scarcity by introducing NoiseStability, a simple, model-internal criterion that measures output deviations when model parameters are perturbed by small noise. It shows that the expected deviation magnitude aligns with the Jacobian norm and that the collected perturbation directions act as randomly projected gradients, preserving key geometric relationships and enabling diverse batch selection via a k-center approach. The method requires no auxiliary models or data-dependent designs and demonstrates competitive or superior performance across image classification, regression, semantic segmentation, and NLP tasks, with supporting theory on gradient equivalence, diversity, and variance reduction. The approach is practical, scalable, and broadly applicable, offering a robust alternative to existing gradient-based and Bayesian Active Learning methods.

Abstract

Uncertainty estimation for unlabeled data is crucial to active learning. With a deep neural network employed as the backbone model, the data selection process is highly challenging due to the potential over-confidence of the model inference. Existing methods resort to special learning fashions (e.g. adversarial) or auxiliary models to address this challenge. This tends to result in complex and inefficient pipelines, which would render the methods impractical. In this work, we propose a novel algorithm that leverages noise stability to estimate data uncertainty. The key idea is to measure the output derivation from the original observation when the model parameters are randomly perturbed by noise. We provide theoretical analyses by leveraging the small Gaussian noise theory and demonstrate that our method favors a subset with large and diverse gradients. Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis. It achieves competitive performance compared against state-of-the-art active learning baselines.
Paper Structure (46 sections, 25 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 46 sections, 25 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Classification performance due to the different AL methods on MNIST (left), Cifar10 (middle) and SVHN (right). AUBC with mean and variance is presented in Appendix A.4.3.
  • Figure 2: Active learning performance due to the different methods on the regression task: Ames (left), semantic segmentation task: Cityscapes (middle), and natural language processing task: MRPC (right). AUBC results are presented in Appendix A.5.2.
  • Figure 3: Classification performance due to the different active learning methods with ResNet-18 on Cifar10 (left), Cifar100 (middle) and Caltech101 (right).
  • Figure 4: Evaluation of the different hyper-parameters on MNIST: noise magnitude $\lambda$ (left), sampling times $K$ (middle) and subset selection methods (right). In the left and middle plots, we use dots with different colors to denote different cycles, e.g. the blue dots at the bottom refer to cycle 2 and the grey dots at the top refer to cycle 9.
  • Figure 5: Comparison on MNIST with degenerated implementations (left) and with the Bayesian active learning methods with a selection size of 1 (right).

Theorems & Definitions (3)

  • proof
  • proof
  • proof