Table of Contents
Fetching ...

A Scalable Algorithm for Active Learning

Youguang Chen, Zheyu Wen, George Biros

TL;DR

An approximate algorithm with storage requirements reduced to n(d+c)+c d d 2 and a computational complexity of b n c d 2 is proposed and presented, which demonstrates the accuracy and scalability of the approach using MNIST, CIFAR-10, Caltech101, and ImageNet.

Abstract

FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $\mathcal{O}(c^2d^2+nc^2d)$ storage and $\mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to $\mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of $\mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.

A Scalable Algorithm for Active Learning

TL;DR

An approximate algorithm with storage requirements reduced to n(d+c)+c d d 2 and a computational complexity of b n c d 2 is proposed and presented, which demonstrates the accuracy and scalability of the approach using MNIST, CIFAR-10, Caltech101, and ImageNet.

Abstract

FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points , dimensions , and classes , due to its storage and computational complexity where is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to and a computational complexity of . Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.
Paper Structure (15 sections, 4 theorems, 24 equations, 7 figures, 6 tables, 3 algorithms)

This paper contains 15 sections, 4 theorems, 24 equations, 7 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

[Theorem 10 in firal-neurips] Given $\epsilon \in (0,1)$, let $\eta = 8 \sqrt{\widetilde{d}}/\epsilon$, whenever $b\geq 32 \widetilde{d}/\epsilon^2 + 16 \sqrt{\widetilde{d}}/\epsilon^2$, denote $z$ as the solution corresponding to the points selected by algo:exact-firal, then the algorithm is near-o

Figures (7)

  • Figure 1: The impact of preconditioner on CG iterations. The experimental setup is detailed in \ref{['s:result-acc']}. We showcase the convergence of CG in the initial mirror descent iteration (i.e., Line 6 of \ref{['algo:new_relax']}).
  • Figure 2: Classification accuracy for active learning experiments conducted on MNIST, CIFAR-10, imb-CIFAR-10, ImageNet-50, and imb-ImageNet-50 on MNIST, CIFAR-10, imb-CIFAR-10, ImageNet-50 and imb-ImageNet-50. The upper row ((A)-(E)) are plots of pool accuracy on the unlabeled pool ${\bf X}_u$, the lower row ((F)-(J)) are plots of evaluation accuracy on the evaluation data.
  • Figure 3: Classification accuracy for active learning experiments on Caltech-101 and ImageNet-1k. Both (A) and (B) represent the accuracy on evaluation data for Caltech-101. In (A), the accuracy is averaged with each point having the same weight, while in (B), the accuracy is averaged with each class having the same weight. (C) presents the pool accuracy for ImageNet-1k, and (D) presents the evaluation accuracy for ImageNet-1k.
  • Figure 4: Effect of the number of Rademacher random vectors (top) and CG termination criteria (bottom) on Relax step (i.e., \ref{['algo:new_relax']}). "Exact" refers to the precise Relax solver utilized in Exact-FIRAL, while "Approx" denotes the fast Relax solver employed in Approx-FIRAL. Here, $s$ denotes the number of Rademacher random vectors, and $cg_{tol}$ signifies the relative residual termination tolerance used in the CG solves.
  • Figure 5: Wall-clock time dependence of the Relax and Round solves to the number of features $d$ and the number of classes $c$ using ImageNet-1K. In the run for the $d$ scaling, we fix the number of data points $n=100000.0$ and the number of classes $c=1000$. We set the number of random vectors to $s=10$. For each value of $d$, we run one gradient and fix the number of CG iterations to $n_{CG}=50$; and the left column represents theoretical time and the right column represents experimental time. In the run to test the algorithmics scalability in $c$, we fix $n=1300000.0$, $d=383$ and vary c as $\left[100, 200, 400, 800, 1000\right]$ . The remaining parameters of the algorithm are fixed. We report the results as follows(A) Relax run for $d$ scaling. (B) Relax run for $c$ scaling. (C) Round solve for $d$ scaling. (D) Round solve for $c$ scaling.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Lemma 2: Matrix-free Hessian matvec
  • proof
  • Definition 1: Block diagonal operation $\mathcal{B}(\cdot)$
  • Lemma 3
  • Proposition 4
  • proof