Neural Active Learning Beyond Bandits
Yikun Ban, Ishika Agarwal, Ziwei Wu, Yada Zhu, Kommy Weldemariam, Hanghang Tong, Jingrui He
TL;DR
This work tackles the challenge of scaling neural active learning to $K$-class problems without incurring the prohibitive $K$-dependent costs of bandit-based reductions. It introduces two neural architectures—exploitation-focused $f_1$ and exploration-oriented $f_2$—that operate on the original $d$-dimensional input and produce $K$-class scores, enabled by an end-to-end embedding $\phi(\cdot)$ that preserves class information while reducing input dimensionality. The authors establish non-parametric regret guarantees for stream- and pool-based active learning and demonstrate slower growth in $K$ compared to prior methods, supported by extensive experiments across six datasets showing superior accuracy and efficiency. The approach unifies principled exploration with neural networks in a way that avoids transforming active learning into a traditional bandit problem, offering practical gains and theoretical insight for scalable neural active learning.
Abstract
We study both stream-based and pool-based active learning with neural network approximations. A recent line of works proposed bandit-based approaches that transformed active learning into a bandit problem, achieving both theoretical and empirical success. However, the performance and computational costs of these methods may be susceptible to the number of classes, denoted as $K$, due to this transformation. Therefore, this paper seeks to answer the question: "How can we mitigate the adverse impacts of $K$ while retaining the advantages of principled exploration and provable performance guarantees in active learning?" To tackle this challenge, we propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning. Subsequently, we provide theoretical performance guarantees for both algorithms in a non-parametric setting, demonstrating a slower error-growth rate concerning $K$ for the proposed approaches. We use extensive experiments to evaluate the proposed algorithms, which consistently outperform state-of-the-art baselines.
