Deep Active Learning in the Open World
Tian Xie, Jifan Zhang, Haoyue Bai, Robert Nowak
TL;DR
This work tackles open-world learning under long-tail distributions and limited annotation budgets by introducing ALOE, a two-stage active learning algorithm that first enforces diversity via clustering and then prioritizes potential unknowns using GradNorm-based OOD scoring. By empirically evaluating on CIFAR100-LT, ImageNet-LT, and Places365-LT, ALOE consistently outperforms standard baselines in balanced accuracy and accelerates novel class discovery, notably achieving up to $70\%$ annotation savings on ImageNet-LT. A key insight is the tradeoff between improving known-class accuracy and discovering new classes, which motivates future research into dynamic exploration-exploitation strategies. The approach offers practical benefits for open-world systems where the total class set is unknown and labeling is costly, enabling more robust and scalable model adaptation.
Abstract
Machine learning models deployed in open-world scenarios often encounter unfamiliar conditions and perform poorly in unanticipated situations. As AI systems advance and find application in safety-critical domains, effectively handling out-of-distribution (OOD) data is crucial to building open-world learning systems. In this work, we introduce ALOE, a novel active learning algorithm for open-world environments designed to enhance model adaptation by incorporating new OOD classes via a two-stage approach. First, diversity sampling selects a representative set of examples, followed by energy-based OOD detection to prioritize likely unknown classes for annotation. This strategy accelerates class discovery and learning, even under constrained annotation budgets. Evaluations on three long-tailed image classification benchmarks demonstrate that ALOE outperforms traditional active learning baselines, effectively expanding known categories while balancing annotation cost. Our findings reveal a crucial tradeoff between enhancing known-class performance and discovering new classes, setting the stage for future advancements in open-world machine learning.
