What Makes Good Few-shot Examples for Vision-Language Models?
Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, ZhenXing Qian, Fei Tan
TL;DR
This paper shows that few-shot learning outcomes for vision-language models are highly sensitive to the chosen training examples, often more than to the prompting strategy. It critically evaluates standard Active Learning methods (Entropy, Margin) and finds them largely ineffective in VL few-shot settings, proposing two data-selection strategies: Gaussian Monte Carlo and Representativeness (REPRE). Across CoOp, MaPLe, and Linear Probe on 11 diverse datasets, these selectors consistently improve performance over random sampling and AL baselines, with REPRE excelling in several configurations. The work highlights that dataset characteristics, such as generality, influence the effectiveness of Monte Carlo, and provides practical guidance for sample-efficient VL fine-tuning and robust prompt-learning designs.
Abstract
Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research. In this study, we delve into devising more effective strategies for the meticulous selection of few-shot training examples, as opposed to relying on random sampling, to enhance the potential of existing few-shot prompt learning methodologies. To achieve this, we assess the effectiveness of various Active Learning (AL) techniques for instance selection, such as Entropy and Margin of Confidence, within the context of few-shot training. Furthermore, we introduce two innovative selection methods - Representativeness (REPRE) and Gaussian Monte Carlo (Montecarlo) - designed to proactively pinpoint informative examples for labeling in relation to pre-trained VL models. Our findings demonstrate that both REPRE and Montecarlo significantly surpass both random selection and AL-based strategies in few-shot training scenarios. The research also underscores that these instance selection methods are model-agnostic, offering a versatile enhancement to a wide array of few-shot training methodologies.
