Table of Contents
Fetching ...

Querying Easily Flip-flopped Samples for Deep Active Learning

Seong Jin Cho, Gwangsu Kim, Junghyun Lee, Jinwoo Shin, Chang D. Yoo

TL;DR

This work introduces the Least Disagree Metric (LDM) as a theoretically grounded, perturbation-based measure of a sample's proximity to the decision boundary in multiclass deep models, along with an asymptotically consistent estimator L_{N,M}. Building on LDM, the authors propose LDM-S, an active learning method that combines small-LDM sampling with diversity via LDM-Seeding (a k-means++-style seeding using last-layer cosine distance). Empirical evaluations across six OpenML and several image datasets show that LDM-S achieves state-of-the-art performance with competitive runtime, and analyses highlight the importance of batch diversity for robust performance. The work suggests promising future directions for rigorous sample complexity guarantees and scalable posterior-based sampling frameworks.

Abstract

Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty, but it is often intractable to compute, especially for complex decision boundaries formed in multiclass classification tasks. To address this issue, this paper proposes the {\it least disagree metric} (LDM), defined as the smallest probability of disagreement of the predicted label, and an estimator for LDM proven to be asymptotically consistent under mild assumptions. The estimator is computationally efficient and can be easily implemented for deep learning models using parameter perturbation. The LDM-based active learning is performed by querying unlabeled data with the smallest LDM. Experimental results show that our LDM-based active learning algorithm obtains state-of-the-art overall performance on all considered datasets and deep architectures.

Querying Easily Flip-flopped Samples for Deep Active Learning

TL;DR

This work introduces the Least Disagree Metric (LDM) as a theoretically grounded, perturbation-based measure of a sample's proximity to the decision boundary in multiclass deep models, along with an asymptotically consistent estimator L_{N,M}. Building on LDM, the authors propose LDM-S, an active learning method that combines small-LDM sampling with diversity via LDM-Seeding (a k-means++-style seeding using last-layer cosine distance). Empirical evaluations across six OpenML and several image datasets show that LDM-S achieves state-of-the-art performance with competitive runtime, and analyses highlight the importance of batch diversity for robust performance. The work suggests promising future directions for rigorous sample complexity guarantees and scalable posterior-based sampling frameworks.

Abstract

Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty, but it is often intractable to compute, especially for complex decision boundaries formed in multiclass classification tasks. To address this issue, this paper proposes the {\it least disagree metric} (LDM), defined as the smallest probability of disagreement of the predicted label, and an estimator for LDM proven to be asymptotically consistent under mild assumptions. The estimator is computationally efficient and can be easily implemented for deep learning models using parameter perturbation. The LDM-based active learning is performed by querying unlabeled data with the smallest LDM. Experimental results show that our LDM-based active learning algorithm obtains state-of-the-art overall performance on all considered datasets and deep architectures.
Paper Structure (47 sections, 7 theorems, 40 equations, 18 figures, 4 tables, 2 algorithms)

This paper contains 47 sections, 7 theorems, 40 equations, 18 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let $g \in {\mathcal{H}}$, ${\bm{x}}_0 \in {\mathcal{X}}$, and $\delta > 0$ be arbitrary. Under Assumption assumption:H, assumption:Lipschitz, and assumption:coverage, with $M > \frac{8}{\delta^2}\log(C N)$, we have that for any $\varepsilon \in (0, 1)$, Furthermore, as $\min(M, N) \rightarrow \infty$ withFor the asymptotic analyses, we write $f(n) = \omega(g(n))$ if $\lim_{n \rightarrow \infty}

Figures (18)

  • Figure 1: An example of LDM of ${\bm{x}}_0$ for given $g$ in binary classification with the linear classifier. Here ${\bm{x}}$ is uniformly distributed on $\mathcal{X} \subset \mathbb{R}^2$. The $h_{\theta}$ disagrees with $g$ for ${\bm{x}}_0$ when $\theta \! < \! \shortminus\pi \! + \! \theta_0$ or $\theta_0 \! < \! \theta$, thus $L(g, {\bm{x}}_0) \! = \! \inf_{h_{\theta} \in \mathcal{H}^{g, {\bm{x}}_0}} \rho (h_{\theta}, g) = \frac{|\theta_0|}{\pi}$.
  • Figure 2: The comparison of selecting sample(s). The black crosses and circles are labeled, and the gray dots are unlabeled samples. (a) Selected samples by LDM-based, entropy-based, and random sampling in binary classification with the linear classifier. (b) The test accuracy with respect to the number of labeled samples. (c) The t-SNE plot of selected batch samples in 3-class classification with a deep network on MNIST dataset.
  • Figure 3: The improved test accuracy by labeling the $k$th batch of size $q$ from pool data sorted in ascending order of LDM when the number of labeled samples is $100$ (a) or $300$ (d), and t-SNE plots of the first and eighth batches for each case (b-c, e-f) on MNIST.
  • Figure 4: The performance comparison across datasets (a) Dolan-Moré plot among the algorithms across all experiments. AUC is the area under the curve. (b) The pairwise penalty matrix over all experiments. Element $P_{i, j}$ corresponds roughly to the number of times algorithm $i$ outperforms algorithm $j$. Column-wise averages at the bottom show overall performance (lower is better).
  • Figure 5: Examples of negative Spearman's rank correlation between LDM order and uncertainty order on MNIST (a), CIFAR10 (b), SVHN (c), CIFAR100 (d), Tiny ImageNet (e), and FOOD101 (f).
  • ...and 13 more figures

Theorems & Definitions (14)

  • Definition 1
  • Theorem 1
  • Corollary 1
  • Remark 1
  • Remark 2
  • Lemma 1: Theorem 4.2 of wainwright2019highdim
  • Lemma 2
  • proof
  • Proposition 1
  • proof : Proof of Proposition \ref{['prop:sampling']}
  • ...and 4 more