Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

Ziting Wen; Oscar Pizarro; Stefan Williams

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

Ziting Wen, Oscar Pizarro, Stefan Williams

TL;DR

A novel method, aligned selection via proxy, is proposed, which improves proxy-based active learning performance by updating pre-computed features and selecting a proper training method, which improves the total cost of efficient active learning while maintaining computational efficiency.

Abstract

Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, sometimes outweighing the computational cost savings. This paper demonstrates that not all sample selection differences result in performance degradation. Furthermore, we show that suitable training methods can mitigate the decline of active learning performance caused by certain selection discrepancies. Building upon detailed analysis, we propose a novel method, aligned selection via proxy, which improves proxy-based active learning performance by updating pre-computed features and selecting a proper training method. Extensive experiments validate that our method improves the total cost of efficient active learning while maintaining computational efficiency. The code is available at \url{https://github.com/ZiTingW/asvp}.

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

TL;DR

Abstract

Paper Structure (29 sections, 1 equation, 28 figures, 12 tables)

This paper contains 29 sections, 1 equation, 28 figures, 12 tables.

Introduction
Related Work
Preliminary
Selection via Proxy based on Pre-computed Features
LogME-PED
Observation and Analysis
Impact of Sample Selection Differences on SVPp Performance
Empirical Evidence
Role of Region A2 and B2 in Improving SVPp Performance
Empirical Evidence
Empirical Evidence
Discussions
Method
Results
Experiment Setup
...and 14 more sections

Figures (28)

Figure 1: Comparing sample selection time and accuracy in active learning (margin sampling) with different scale models. The active learning included a total of 16 iterations, selecting a total of 400k samples on the ImageNet dataset. All models are pre-trained using BYOL-EMAN cai2021exponential.
Figure 2: Comparison of the labeling and training costs across random sampling, the standard active learning pipeline, the efficient active learning method (SVPp), and our method (ASVP), using margin sampling to achieve the same accuracy as randomly selecting 400k samples on the ImageNet dataset. A ResNet-50 model was employed, with training costs calculated based on AWS EC2 P3 instances (following Zhang et al., 2024), and annotation costs estimated using AWS Mechanical Turk with triple reviews.
Figure 3: The Selection via Proxy based on pre-trained features (SVPp) Framework. Stage 1: pre-computing features. Stage 2: sample selection based on the proxy model (a simple classifier) with pre-computed features as input. Stage 3: Fine-tuning the pre-trained model using labeled samples.
Figure 4: Sample Selection Discrepancy: Proxy vs. Fine-tuned Models. For instance, in uncertainty-based active learning strategies, the two axes represent the confidence of predictions for the proxy model and the fine-tuned model, with the positive half-axis indicating correct predictions and the negative half-axis indicating incorrect predictions. AL strategy selects samples with the lowest prediction confidence, meaning the proxy model chooses samples from regions O, C, and D, while the fine-tuned model selects samples from regions O, A, and B. Depending on whether the proxy model correctly predicts the samples, regions A and B are further divided into subregions A1, A2, B1, and B2. Both the proxy model and the fine-tuned model confidently predict the remaining white region, hence they do not select samples from it.
Figure 5: The impact of different regions on SVPp active learning performance. Replacing samples selected by the proxy model with those from regions A1, A2, B1, B2. The active learning gain refers to the difference in accuracy between active learning and the random baseline.
...and 23 more figures

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

TL;DR

Abstract

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

Authors

TL;DR

Abstract

Table of Contents

Figures (28)