Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal
TL;DR
The paper tackles label-efficient batch active learning for deep neural networks by proposing BADGE, a gradient-embedding based sampler. BADGE computes last-layer gradient embeddings using the model's current predictions and uses k-means++ seeding to select diverse, informative batches. Through extensive experiments across architectures, batch sizes, and datasets, BADGE consistently matches or outperforms baselines and shows robustness to hyperparameters. The method provides a practical, hyperparameter-free approach that blends uncertainty and diversity, with favorable scaling and runtime characteristics compared with alternatives like k-DPP.
Abstract
We design a new algorithm for batch active learning with deep neural network models. Our algorithm, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BADGE trades off between diversity and uncertainty without requiring any hand-tuned hyperparameters. We show that while other approaches sometimes succeed for particular batch sizes or architectures, BADGE consistently performs as well or better, making it a versatile option for practical active learning problems.
