Table of Contents
Fetching ...

Boundary Matters: A Bi-Level Active Finetuning Framework

Han Lu, Yichen Xie, Xiaokang Yang, Junchi Yan

TL;DR

BiLAF tackles the annotation bottleneck in pretraining–finetuning by introducing a two-stage, boundary-aware sampling framework. It combines core sample selection for broad coverage with unsupervised denoising and an iterative boundary sampling strategy to target uncertain regions near decision boundaries, all without ground-truth labels. Empirical results on CIFAR and ImageNet demonstrate notable gains over state-of-the-art baselines, particularly at common finite budgets, validating the value of incorporating boundary information in active finetuning. The approach offers a practical data-efficient path for deploying pretrained models in real-world vision tasks where labeling is costly.

Abstract

The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields, yet it faces the significant challenge of high sample annotation costs. To mitigate this, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection. Furthermore, the recent active finetuning approach has primarily concentrated on aligning the distribution of selected subsets with the overall data pool, focusing solely on diversity. In this paper, we propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty. The process begins with the identification of pseudo-class centers, followed by an innovative denoising method and an iterative strategy for boundary sample selection in the high-dimensional feature space, all without relying on ground-truth labels. Our comprehensive experiments provide both qualitative and quantitative evidence of our method's efficacy, outperforming all the existing baselines.

Boundary Matters: A Bi-Level Active Finetuning Framework

TL;DR

BiLAF tackles the annotation bottleneck in pretraining–finetuning by introducing a two-stage, boundary-aware sampling framework. It combines core sample selection for broad coverage with unsupervised denoising and an iterative boundary sampling strategy to target uncertain regions near decision boundaries, all without ground-truth labels. Empirical results on CIFAR and ImageNet demonstrate notable gains over state-of-the-art baselines, particularly at common finite budgets, validating the value of incorporating boundary information in active finetuning. The approach offers a practical data-efficient path for deploying pretrained models in real-world vision tasks where labeling is costly.

Abstract

The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields, yet it faces the significant challenge of high sample annotation costs. To mitigate this, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection. Furthermore, the recent active finetuning approach has primarily concentrated on aligning the distribution of selected subsets with the overall data pool, focusing solely on diversity. In this paper, we propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty. The process begins with the identification of pseudo-class centers, followed by an innovative denoising method and an iterative strategy for boundary sample selection in the high-dimensional feature space, all without relying on ground-truth labels. Our comprehensive experiments provide both qualitative and quantitative evidence of our method's efficacy, outperforming all the existing baselines.
Paper Structure (29 sections, 11 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 11 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Design philosophy of our BiLAF framework. In contrast to previous methods, our method ensures the selection of central samples to maintain diversity while also reserving capacity to choose boundary samples to enhance decision boundary learning.
  • Figure 2: Our BiLAF framework in the Active Finetuning task. Within the high-dimensional feature space, the first stage core sample selection is dedicated to identifying pseudo-class centers. Building upon this, we have developed a denoising method to filter out noise samples. Subsequently, we compute the Boundary Score metric for each sample, which aids in the iterative selection of samples and the removal of candidates from the pool. Ultimately, the selected samples are labeled for supervised finetuning.
  • Figure 3: Denoising Visualization using tSNE Embeddings on CIFAR10 The left figure illustrates the initial samples, while the right figure depicts the scenario after denoising 30% of the samples. The red bounding boxes highlight several conspicuous changes.
  • Figure 4: tSNE Embeddings on CIFAR10 with 1% annotation budget of our BiLAF method. Pentagrams represent the selected core samples, while circles denote the chosen boundary samples.
  • Figure 5: The Hyperparameter Influence on CIFAR100.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: Density Distance
  • Definition 2: Intra-class Distance
  • Definition 3: Inter-class Distance
  • Definition 4: Boundary Score