Boundary Matters: A Bi-Level Active Finetuning Framework
Han Lu, Yichen Xie, Xiaokang Yang, Junchi Yan
TL;DR
BiLAF tackles the annotation bottleneck in pretraining–finetuning by introducing a two-stage, boundary-aware sampling framework. It combines core sample selection for broad coverage with unsupervised denoising and an iterative boundary sampling strategy to target uncertain regions near decision boundaries, all without ground-truth labels. Empirical results on CIFAR and ImageNet demonstrate notable gains over state-of-the-art baselines, particularly at common finite budgets, validating the value of incorporating boundary information in active finetuning. The approach offers a practical data-efficient path for deploying pretrained models in real-world vision tasks where labeling is costly.
Abstract
The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields, yet it faces the significant challenge of high sample annotation costs. To mitigate this, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection. Furthermore, the recent active finetuning approach has primarily concentrated on aligning the distribution of selected subsets with the overall data pool, focusing solely on diversity. In this paper, we propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty. The process begins with the identification of pseudo-class centers, followed by an innovative denoising method and an iterative strategy for boundary sample selection in the high-dimensional feature space, all without relying on ground-truth labels. Our comprehensive experiments provide both qualitative and quantitative evidence of our method's efficacy, outperforming all the existing baselines.
