Table of Contents
Fetching ...

The Speed-up Factor: A Quantitative Multi-Iteration Active Learning Performance Metric

Hannes Kath, Thiago S. Gouvêa, Daniel Sonntag

TL;DR

This work formally introduces the speed-up factor, a quantitative multi-iteration QM performance metric that indicates the fraction of samples needed to match random sampling performance.

Abstract

Machine learning models excel with abundant annotated data, but annotation is often costly and time-intensive. Active learning (AL) aims to improve the performance-to-annotation ratio by using query methods (QMs) to iteratively select the most informative samples. While AL research focuses mainly on QM development, the evaluation of this iterative process lacks appropriate performance metrics. This work reviews eight years of AL evaluation literature and formally introduces the speed-up factor, a quantitative multi-iteration QM performance metric that indicates the fraction of samples needed to match random sampling performance. Using four datasets from diverse domains and seven QMs of various types, we empirically evaluate the speed-up factor and compare it with state-of-the-art AL performance metrics. The results confirm the assumptions underlying the speed-up factor, demonstrate its accuracy in capturing the described fraction, and reveal its superior stability across iterations.

The Speed-up Factor: A Quantitative Multi-Iteration Active Learning Performance Metric

TL;DR

This work formally introduces the speed-up factor, a quantitative multi-iteration QM performance metric that indicates the fraction of samples needed to match random sampling performance.

Abstract

Machine learning models excel with abundant annotated data, but annotation is often costly and time-intensive. Active learning (AL) aims to improve the performance-to-annotation ratio by using query methods (QMs) to iteratively select the most informative samples. While AL research focuses mainly on QM development, the evaluation of this iterative process lacks appropriate performance metrics. This work reviews eight years of AL evaluation literature and formally introduces the speed-up factor, a quantitative multi-iteration QM performance metric that indicates the fraction of samples needed to match random sampling performance. Using four datasets from diverse domains and seven QMs of various types, we empirically evaluate the speed-up factor and compare it with state-of-the-art AL performance metrics. The results confirm the assumptions underlying the speed-up factor, demonstrate its accuracy in capturing the described fraction, and reveal its superior stability across iterations.
Paper Structure (32 sections, 5 equations, 7 figures, 6 tables)

This paper contains 32 sections, 5 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: al: The dataset $\mathcal{D}$ is split into the evaluation set $\mathcal{D}_E$ and the set $\mathcal{D}\!_{AL}$. (1) A qm uses the current unlabelled dataset $\mathcal{D}_U^{t-1}$ and labelled dataset $\mathcal{D}_L^{t-1}$ to query samples $\left(\mathcal{Q}^t_U\right)$ from $\mathcal{D}_U^{t-1}$. (2) $\mathcal{Q}^t_U$ is labelled $\left(\mathcal{Q}^t_L\right)$ by a human expert. (3) Update datasets: $\mathcal{D}_U^{t} = \mathcal{D}_U^{t-1} \setminus \mathcal{Q}^t_U$ and $\mathcal{D}_L^{t} = \mathcal{D}_L^{t-1} \cup \mathcal{Q}^t_L$.
  • Figure 2: Learning curve schematic for random sampling and active learning (AL) showing scatter points, approximation by direct connection, and approximation by using $\hat{p}_{\text{qm}}(x) = a_{\infty} \left(1 - e^{a_0-\frac{x}{b_{\text{qm}}}}\right)$ with $a_{\infty}=0.9$, $a_0=-0.2$, $b_{\text{rand}}=600$ and $b_{\text{AL}}=150$.
  • Figure 3: Synthetic illustration of $\mathcal{A}4$. Left: Two-dimensional dataset with 4 000 samples and 90/10 class imbalance. Center: Connected learning curves for random sampling and the query method ratio max using logistic regression. Right: Ratio $\frac{x_\text{qm}(p)}{x_\text{rand}(p)}$ obtained by inverting the connected learning curves.
  • Figure 4: Empirical al results. 'Performance' refers to macro F1 score. qm color and marker coding is shown in the legend below the figure; only panels with deviations include individual legends. Using multiple random seeds, $\mu$ denotes the mean, $\sigma$ the standard deviation and SEM the standart error of the mean. A--B: MS COCO$_{\text{2k}}$, all classes, : ratio max, 5 random seeds. A) Learning curve (connected), $\mu \pm \text{SEM}$. B) Speed-up factor (connected), $\mu$. C--F: 2k datasets, all classes, 30 random seeds. C) Learning curve (connected), all , $\mu \pm \sigma$. D) Speed-up factor (connected), all , $\mu$. E) Learning curve (scatter + $\hat{p}$ fit), single , $\mu \pm \sigma$. F) Performance metric stability, single , $\mu$. G--I: Complete datasets, 5 random seeds. G) Learning curve (connected), all classes, all , $\mu \pm \text{SEM}$. H) Learning curve (scatter + $\hat{p}$ fit), single class, single , $\mu \pm \text{SEM}$. I) Performance metric stability, single class (same as in H), single , $\mu$.
  • Figure 5: Metric stability comparing the stop budget method and the proposed speed-up factor. Left: Learning curves for random sampling and k-means on the single-label subset of Reuters-21578 (also shown in \ref{['fig:experiments']}H). Center: Performance decisions over the stop budget using the stop budget method. Right: Performance decisions over the stop budget using the speed-up factor.
  • ...and 2 more figures