Table of Contents
Fetching ...

Neural Active Learning Beyond Bandits

Yikun Ban, Ishika Agarwal, Ziwei Wu, Yada Zhu, Kommy Weldemariam, Hanghang Tong, Jingrui He

TL;DR

This work tackles the challenge of scaling neural active learning to $K$-class problems without incurring the prohibitive $K$-dependent costs of bandit-based reductions. It introduces two neural architectures—exploitation-focused $f_1$ and exploration-oriented $f_2$—that operate on the original $d$-dimensional input and produce $K$-class scores, enabled by an end-to-end embedding $\phi(\cdot)$ that preserves class information while reducing input dimensionality. The authors establish non-parametric regret guarantees for stream- and pool-based active learning and demonstrate slower growth in $K$ compared to prior methods, supported by extensive experiments across six datasets showing superior accuracy and efficiency. The approach unifies principled exploration with neural networks in a way that avoids transforming active learning into a traditional bandit problem, offering practical gains and theoretical insight for scalable neural active learning.

Abstract

We study both stream-based and pool-based active learning with neural network approximations. A recent line of works proposed bandit-based approaches that transformed active learning into a bandit problem, achieving both theoretical and empirical success. However, the performance and computational costs of these methods may be susceptible to the number of classes, denoted as $K$, due to this transformation. Therefore, this paper seeks to answer the question: "How can we mitigate the adverse impacts of $K$ while retaining the advantages of principled exploration and provable performance guarantees in active learning?" To tackle this challenge, we propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning. Subsequently, we provide theoretical performance guarantees for both algorithms in a non-parametric setting, demonstrating a slower error-growth rate concerning $K$ for the proposed approaches. We use extensive experiments to evaluate the proposed algorithms, which consistently outperform state-of-the-art baselines.

Neural Active Learning Beyond Bandits

TL;DR

This work tackles the challenge of scaling neural active learning to -class problems without incurring the prohibitive -dependent costs of bandit-based reductions. It introduces two neural architectures—exploitation-focused and exploration-oriented —that operate on the original -dimensional input and produce -class scores, enabled by an end-to-end embedding that preserves class information while reducing input dimensionality. The authors establish non-parametric regret guarantees for stream- and pool-based active learning and demonstrate slower growth in compared to prior methods, supported by extensive experiments across six datasets showing superior accuracy and efficiency. The approach unifies principled exploration with neural networks in a way that avoids transforming active learning into a traditional bandit problem, offering practical gains and theoretical insight for scalable neural active learning.

Abstract

We study both stream-based and pool-based active learning with neural network approximations. A recent line of works proposed bandit-based approaches that transformed active learning into a bandit problem, achieving both theoretical and empirical success. However, the performance and computational costs of these methods may be susceptible to the number of classes, denoted as , due to this transformation. Therefore, this paper seeks to answer the question: "How can we mitigate the adverse impacts of while retaining the advantages of principled exploration and provable performance guarantees in active learning?" To tackle this challenge, we propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning. Subsequently, we provide theoretical performance guarantees for both algorithms in a non-parametric setting, demonstrating a slower error-growth rate concerning for the proposed approaches. We use extensive experiments to evaluate the proposed algorithms, which consistently outperform state-of-the-art baselines.
Paper Structure (17 sections, 30 theorems, 144 equations, 2 figures, 8 tables, 2 algorithms)

This paper contains 17 sections, 30 theorems, 144 equations, 2 figures, 8 tables, 2 algorithms.

Key Result

Theorem 5.1

[Binary Classification] Given $T$, for any $\delta \in (0, 1),$$\lambda_0 > 0$, suppose $K=2$, $\|\mathbf{x}_t\|_2 = 1, t \in [T]$, $\mathbf{H} \succeq \lambda_0 \mathbf{I}$, $m \geq \widetilde{ \Omega}(\text{poly} (T, L, S) \cdot \log(1/\delta)), \eta_1 = \eta_2 = \Theta(\frac{ S }{m\sqrt{2T}})$.

Figures (2)

  • Figure 1: Regret comparison on six datasets in the stream-based setting. NeurOnAl-S outperforms baselines on most datasets.
  • Figure 2: Test accuracy versus the number of query rounds in pool-based setting on six datasets. NeurOnAl-P outperforms baselines on all datasets.

Theorems & Definitions (57)

  • Definition 4.1: End-to-end Embedding
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Theorem 5.4: Pool-based
  • Definition C.1: NTK ntk2018neuralwang2021neural
  • Lemma D.1: allen2019convergence
  • Lemma D.2
  • proof
  • Lemma D.3
  • ...and 47 more