Table of Contents
Fetching ...

On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

Daniel Haimovich, Dima Karamshuk, Fridolin Linder, Niek Tax, Milan Vojnovic

TL;DR

A novel algorithm called Adaptive-Weight Sampling (AWS) is proposed that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation and convergence rate results for AWS for smooth convex training loss functions are established.

Abstract

We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.

On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

TL;DR

A novel algorithm called Adaptive-Weight Sampling (AWS) is proposed that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation and convergence rate results for AWS for smooth convex training loss functions are established.

Abstract

We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.
Paper Structure (51 sections, 19 theorems, 161 equations, 11 figures, 2 tables)

This paper contains 51 sections, 19 theorems, 161 equations, 11 figures, 2 tables.

Key Result

Theorem 3.1

Assume that $\rho^* > 1$, the loss function is the squared hinge loss function, and the sampling probability function $\pi$ is such that for all $u \leq 1$, $\pi(u) \leq \beta/2$ and for some constants $0 < \beta \leq 2$ and $\mu \geq \sqrt{2}/(\rho^*-1)$. Then, for any initial value $\theta_1$ such that $||\theta_1-\theta^*||\leq S$ and $\{\theta_t\}_{t>1}$ according to algorithm (equ:sgd) with

Figures (11)

  • Figure 1: Convergence in terms of average cross-entropy progressive loss of random sampling, loss-based sampling based on the absolute error loss, and our proposed algorithm (loss-based sampling with stochastic Polyak’s step size). Our proposed algorithm outperforms the baselines in most cases.
  • Figure 2: Active learning sampling based on an estimator of the absolute error loss performs on par with the sampling based on the ground truth value of absolute error loss.
  • Figure 3: Sampling probability function for the family of generalized smooth hinge loss functions.
  • Figure 4: Upper bounds for function $h$ defined in (\ref{['equ:hfunc']}): (left) bound of Lemma \ref{['lem:hub1']}, (right) bounds of Lemma \ref{['lem:hub2']}.
  • Figure 5: Average cross entropy loss as a function of labeling cost for different sampling methods.
  • ...and 6 more figures

Theorems & Definitions (24)

  • Theorem 3.1
  • Lemma 3.2
  • Theorem 3.3
  • Corollary 3.4
  • Lemma 3.5
  • Theorem 3.6
  • Corollary 3.7
  • Corollary 3.8
  • Lemma A.2
  • proof
  • ...and 14 more