Table of Contents
Fetching ...

Direct Acquisition Optimization for Low-Budget Active Learning

Zhuokai Zhao, Yibo Jiang, Yuxin Chen

TL;DR

This work tackles the problem of active learning under ultra-low labeling budgets by introducing Direct Acquisition Optimization (DAO), which prioritizes samples based on expected true loss reduction while avoiding costly retraining and large validation sets. DAO combines influence-function-based model parameter updates, surrogate-based label approximation, and bias-corrected loss estimation (LURE) to efficiently estimate the impact of acquiring new data. Across seven benchmarks, DAO consistently outperforms state-of-the-art AL methods, with particularly strong gains in extreme low-budget scenarios such as SVHN with $B=10$. The approach offers a practical pathway to data-efficient learning in domains where labeling is expensive, and it opens avenues for integration with unsupervised and semi-supervised techniques to further reduce labeling needs.

Abstract

Active Learning (AL) has gained prominence in integrating data-intensive machine learning (ML) models into domains with limited labeled data. However, its effectiveness diminishes significantly when the labeling budget is low. In this paper, we first empirically observe the performance degradation of existing AL algorithms in the low-budget settings, and then introduce Direct Acquisition Optimization (DAO), a novel AL algorithm that optimizes sample selections based on expected true loss reduction. Specifically, DAO utilizes influence functions to update model parameters and incorporates an additional acquisition strategy to mitigate bias in loss estimation. This approach facilitates a more accurate estimation of the overall error reduction, without extensive computations or reliance on labeled data. Experiments demonstrate DAO's effectiveness in low budget settings, outperforming state-of-the-arts approaches across seven benchmarks.

Direct Acquisition Optimization for Low-Budget Active Learning

TL;DR

This work tackles the problem of active learning under ultra-low labeling budgets by introducing Direct Acquisition Optimization (DAO), which prioritizes samples based on expected true loss reduction while avoiding costly retraining and large validation sets. DAO combines influence-function-based model parameter updates, surrogate-based label approximation, and bias-corrected loss estimation (LURE) to efficiently estimate the impact of acquiring new data. Across seven benchmarks, DAO consistently outperforms state-of-the-art AL methods, with particularly strong gains in extreme low-budget scenarios such as SVHN with . The approach offers a practical pathway to data-efficient learning in domains where labeling is expensive, and it opens avenues for integration with unsupervised and semi-supervised techniques to further reduce labeling needs.

Abstract

Active Learning (AL) has gained prominence in integrating data-intensive machine learning (ML) models into domains with limited labeled data. However, its effectiveness diminishes significantly when the labeling budget is low. In this paper, we first empirically observe the performance degradation of existing AL algorithms in the low-budget settings, and then introduce Direct Acquisition Optimization (DAO), a novel AL algorithm that optimizes sample selections based on expected true loss reduction. Specifically, DAO utilizes influence functions to update model parameters and incorporates an additional acquisition strategy to mitigate bias in loss estimation. This approach facilitates a more accurate estimation of the overall error reduction, without extensive computations or reliance on labeled data. Experiments demonstrate DAO's effectiveness in low budget settings, outperforming state-of-the-arts approaches across seven benchmarks.
Paper Structure (31 sections, 18 equations, 6 figures, 1 algorithm)

This paper contains 31 sections, 18 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Existing methods fail to outperform random sampling with small budgets. This figure shows the relative performance between multiple methods and random acquisition. Within each subplot, $x$ axis represents the accumulative acquisition size, while $y$ axis indicates runs initiated with different random seeds. White color indicates on-par performance with random, blue indicates worse, and red indicates better.
  • Figure 2: Schematic of the algorithmic framework of DAO.
  • Figure 3: Experiment results comparing DAO with existing AL algorithms across seven benchmarks. In all subplots, horizontal axis represents the accumulative size of the labeled set, while vertical axis indicates classification accuracy.
  • Figure 4: Left: MSE of the predictions accuracy on the test split of CIFAR-10 between models updated by single backpropagation, influence function, and the fully trained model. Right: Ablation results where the proposed loss estimation is replaced by the random sampling estimation defined in §\ref{['subsec:bias_vs_random']}.
  • Figure 5: Relative performance between existing popular AL methods and random acquisition. horizontal axis represents the accumulative size of the labeled set, while vertical axis indicates relative performance in percentage.
  • ...and 1 more figures