A Scalable Algorithm for Active Learning

Youguang Chen; Zheyu Wen; George Biros

A Scalable Algorithm for Active Learning

Youguang Chen, Zheyu Wen, George Biros

TL;DR

An approximate algorithm with storage requirements reduced to n(d+c)+c d d 2 and a computational complexity of b n c d 2 is proposed and presented, which demonstrates the accuracy and scalability of the approach using MNIST, CIFAR-10, Caltech101, and ImageNet.

Abstract

FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $\mathcal{O}(c^2d^2+nc^2d)$ storage and $\mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to $\mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of $\mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.

A Scalable Algorithm for Active Learning

TL;DR

Abstract

, dimensions

, and classes

, due to its

storage and

computational complexity where

is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to

and a computational complexity of

. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.

Paper Structure (15 sections, 4 theorems, 24 equations, 7 figures, 6 tables, 3 algorithms)

This paper contains 15 sections, 4 theorems, 24 equations, 7 figures, 6 tables, 3 algorithms.

Introduction
The exact FIRAL algorithm
Formulation
FIRAL: Relax step
FIRAL: Round step
Complexity and scalability of FIRAL
The Approx-FIRAL algorithm
The Hessian structure and a fast Relax step
The new Round step
HPC implementation and complexity analysis
Numerical Experiments
Active learning performance
Single-GPU performance
Parallel scalability
Conclusions

Key Result

Theorem 1

[Theorem 10 in firal-neurips] Given $\epsilon \in (0,1)$, let $\eta = 8 \sqrt{\widetilde{d}}/\epsilon$, whenever $b\geq 32 \widetilde{d}/\epsilon^2 + 16 \sqrt{\widetilde{d}}/\epsilon^2$, denote $z$ as the solution corresponding to the points selected by algo:exact-firal, then the algorithm is near-o

Figures (7)

Figure 1: The impact of preconditioner on CG iterations. The experimental setup is detailed in \ref{['s:result-acc']}. We showcase the convergence of CG in the initial mirror descent iteration (i.e., Line 6 of \ref{['algo:new_relax']}).
Figure 2: Classification accuracy for active learning experiments conducted on MNIST, CIFAR-10, imb-CIFAR-10, ImageNet-50, and imb-ImageNet-50 on MNIST, CIFAR-10, imb-CIFAR-10, ImageNet-50 and imb-ImageNet-50. The upper row ((A)-(E)) are plots of pool accuracy on the unlabeled pool ${\bf X}_u$, the lower row ((F)-(J)) are plots of evaluation accuracy on the evaluation data.
Figure 3: Classification accuracy for active learning experiments on Caltech-101 and ImageNet-1k. Both (A) and (B) represent the accuracy on evaluation data for Caltech-101. In (A), the accuracy is averaged with each point having the same weight, while in (B), the accuracy is averaged with each class having the same weight. (C) presents the pool accuracy for ImageNet-1k, and (D) presents the evaluation accuracy for ImageNet-1k.
Figure 4: Effect of the number of Rademacher random vectors (top) and CG termination criteria (bottom) on Relax step (i.e., \ref{['algo:new_relax']}). "Exact" refers to the precise Relax solver utilized in Exact-FIRAL, while "Approx" denotes the fast Relax solver employed in Approx-FIRAL. Here, $s$ denotes the number of Rademacher random vectors, and $cg_{tol}$ signifies the relative residual termination tolerance used in the CG solves.
Figure 5: Wall-clock time dependence of the Relax and Round solves to the number of features $d$ and the number of classes $c$ using ImageNet-1K. In the run for the $d$ scaling, we fix the number of data points $n=100000.0$ and the number of classes $c=1000$. We set the number of random vectors to $s=10$. For each value of $d$, we run one gradient and fix the number of CG iterations to $n_{CG}=50$; and the left column represents theoretical time and the right column represents experimental time. In the run to test the algorithmics scalability in $c$, we fix $n=1300000.0$, $d=383$ and vary c as $\left[100, 200, 400, 800, 1000\right]$ . The remaining parameters of the algorithm are fixed. We report the results as follows(A) Relax run for $d$ scaling. (B) Relax run for $c$ scaling. (C) Round solve for $d$ scaling. (D) Round solve for $c$ scaling.
...and 2 more figures

Theorems & Definitions (7)

Theorem 1
Lemma 2: Matrix-free Hessian matvec
proof
Definition 1: Block diagonal operation $\mathcal{B}(\cdot)$
Lemma 3
Proposition 4
proof

A Scalable Algorithm for Active Learning

TL;DR

Abstract

A Scalable Algorithm for Active Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (7)