BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation
Pablo Morales-Álvarez, Stergios Christodoulidis, Maria Vakalopoulou, Pablo Piantanida, Jose Dolz
TL;DR
BayesAdapter reframes CLIP few-shot adaptation as Bayesian inference over the adapter weights to obtain a full posterior, improving uncertainty estimation without sacrificing competitiveness in discriminative performance. By learning a variational Gaussian posterior and performing MC-based predictions, it achieves better calibration and higher reliability for high-confidence decisions across 11 datasets and two backbones. While accuracy is typically close to state-of-the-art adapters, its strength lies in uncertainty-aware prediction and selective classification, particularly as the number of shots grows. This approach enhances the safety and practicality of deploying VLM adapters in real-world tasks that require calibrated confidence and selective abstention.
Abstract
The emergence of large pre-trained vision-language models (VLMs) represents a paradigm shift in machine learning, with unprecedented results in a broad span of visual recognition tasks. CLIP, one of the most popular VLMs, has exhibited remarkable zero-shot and transfer learning capabilities in classification. To transfer CLIP to downstream tasks, adapters constitute a parameter-efficient approach that avoids backpropagation through the large model (unlike related prompt learning methods). However, CLIP adapters have been developed to target discriminative performance, and the quality of their uncertainty estimates has been overlooked. In this work we show that the discriminative performance of state-of-the-art CLIP adapters does not always correlate with their uncertainty estimation capabilities, which are essential for a safe deployment in real-world scenarios. We also demonstrate that one of such adapters is obtained through MAP inference from a more general probabilistic framework. Based on this observation we introduce BayesAdapter, which leverages Bayesian inference to estimate a full probability distribution instead of a single point, better capturing the variability inherent in the parameter space. In a comprehensive empirical evaluation we show that our approach obtains high quality uncertainty estimates in the predictions, standing out in calibration and selective classification. Our code will be publicly available upon acceptance of the paper.
