COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
Arnav M. Das, Gantavya Bhatt, Lilly Kumari, Sahil Verma, Jeff Bilmes
TL;DR
COBRA introduces a diversity-aware retrieval augmentation framework for few-shot vision tasks by formulating sample selection as a submodular optimization problem under combinatorial mutual information (CMI). It combines FLMI (facility-location-based MI), soft class balancing, and an optional quality term into a tractable objective with a guaranteed approximation via greedy maximization. Across six datasets and multiple few-shot adapters, COBRA consistently outperforms traditional similarity-based retrieval (Sim-Score, CLIP-score) and synthetic augmentation (SDXL-Aug), often with negligible retrieval overhead. The results highlight the importance of diversity in retrieved data for effective few-shot learning with vision-language models and suggest broader applicability to in-context learning and beyond.
Abstract
Retrieval augmentation, the practice of retrieving additional data from large auxiliary pools, has emerged as an effective technique for enhancing model performance in the low-data regime. Prior approaches have employed only nearest-neighbor based strategies for data selection, which retrieve auxiliary samples with high similarity to instances in the target task. However, these approaches are prone to selecting highly redundant samples, since they fail to incorporate any notion of diversity. In our work, we first demonstrate that data selection strategies used in prior retrieval-augmented few-shot adaptation settings can be generalized using a class of functions known as Combinatorial Mutual Information (CMI) measures. We then propose COBRA (COmBinatorial Retrieval Augmentation), which employs an alternative CMI measure that considers both diversity and similarity to a target dataset. COBRA consistently outperforms previous retrieval approaches across image classification tasks and few-shot learning techniques when used to retrieve samples from LAION-2B. COBRA introduces negligible computational overhead to the cost of retrieval while providing significant gains in downstream model performance.
