Table of Contents
Fetching ...

COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation

Arnav M. Das, Gantavya Bhatt, Lilly Kumari, Sahil Verma, Jeff Bilmes

TL;DR

COBRA introduces a diversity-aware retrieval augmentation framework for few-shot vision tasks by formulating sample selection as a submodular optimization problem under combinatorial mutual information (CMI). It combines FLMI (facility-location-based MI), soft class balancing, and an optional quality term into a tractable objective with a guaranteed approximation via greedy maximization. Across six datasets and multiple few-shot adapters, COBRA consistently outperforms traditional similarity-based retrieval (Sim-Score, CLIP-score) and synthetic augmentation (SDXL-Aug), often with negligible retrieval overhead. The results highlight the importance of diversity in retrieved data for effective few-shot learning with vision-language models and suggest broader applicability to in-context learning and beyond.

Abstract

Retrieval augmentation, the practice of retrieving additional data from large auxiliary pools, has emerged as an effective technique for enhancing model performance in the low-data regime. Prior approaches have employed only nearest-neighbor based strategies for data selection, which retrieve auxiliary samples with high similarity to instances in the target task. However, these approaches are prone to selecting highly redundant samples, since they fail to incorporate any notion of diversity. In our work, we first demonstrate that data selection strategies used in prior retrieval-augmented few-shot adaptation settings can be generalized using a class of functions known as Combinatorial Mutual Information (CMI) measures. We then propose COBRA (COmBinatorial Retrieval Augmentation), which employs an alternative CMI measure that considers both diversity and similarity to a target dataset. COBRA consistently outperforms previous retrieval approaches across image classification tasks and few-shot learning techniques when used to retrieve samples from LAION-2B. COBRA introduces negligible computational overhead to the cost of retrieval while providing significant gains in downstream model performance.

COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation

TL;DR

COBRA introduces a diversity-aware retrieval augmentation framework for few-shot vision tasks by formulating sample selection as a submodular optimization problem under combinatorial mutual information (CMI). It combines FLMI (facility-location-based MI), soft class balancing, and an optional quality term into a tractable objective with a guaranteed approximation via greedy maximization. Across six datasets and multiple few-shot adapters, COBRA consistently outperforms traditional similarity-based retrieval (Sim-Score, CLIP-score) and synthetic augmentation (SDXL-Aug), often with negligible retrieval overhead. The results highlight the importance of diversity in retrieved data for effective few-shot learning with vision-language models and suggest broader applicability to in-context learning and beyond.

Abstract

Retrieval augmentation, the practice of retrieving additional data from large auxiliary pools, has emerged as an effective technique for enhancing model performance in the low-data regime. Prior approaches have employed only nearest-neighbor based strategies for data selection, which retrieve auxiliary samples with high similarity to instances in the target task. However, these approaches are prone to selecting highly redundant samples, since they fail to incorporate any notion of diversity. In our work, we first demonstrate that data selection strategies used in prior retrieval-augmented few-shot adaptation settings can be generalized using a class of functions known as Combinatorial Mutual Information (CMI) measures. We then propose COBRA (COmBinatorial Retrieval Augmentation), which employs an alternative CMI measure that considers both diversity and similarity to a target dataset. COBRA consistently outperforms previous retrieval approaches across image classification tasks and few-shot learning techniques when used to retrieve samples from LAION-2B. COBRA introduces negligible computational overhead to the cost of retrieval while providing significant gains in downstream model performance.

Paper Structure

This paper contains 62 sections, 6 theorems, 16 equations, 17 figures, 13 tables, 1 algorithm.

Key Result

Lemma 2.2

Let $G = (V, E)$ be a graph with edge weights defined with symmetric $\mathbf{W} \in \mathbb{R}^{(m+n) \times (m+n)}$. For any set $A \subseteq V$ of vertices, let $f(A) = \sum_{i \in A} \sum_{j \in V \setminus A} w_{ij}$ be the graph cut function. Given any two sets $A$ and $B$ such that $A \cap B

Figures (17)

  • Figure 1: COBRA uses the target pool to select diverse and relevant samples from a large web-scale auxiliary pool. The retrieved data and the target data are then used to train a few-shot learner with a CLIP backbone.
  • Figure 2: COBRA retrieves samples from LAION that are diverse and relevant to the target dataset. In contrast, Sim-Score retrieves redundant samples and sometimes even exact duplicates (shown in red boxes). Additional visualizations are presented in \ref{['appen sec: more qualitative']}.
  • Figure 3: 2D Example We consider a simple example where $|\mathcal{D}^{\text{tar}}| = 64$ and $|\mathcal{D}^{\text{aux}}| = 25000$. From $\mathcal{D}^{\text{aux}}$, we retrieve a subset of size 128 based on $\mathcal{D}^{\text{tar}}$. COBRA (left) effectively covers the target $\mathcal{D}^{\text{tar}}$, on the other hand, Sim-Score (right) selects clumpy examples, as highlighted by the bounding boxes. Refer to \ref{['appen sec: toy example setup']} for more details.
  • Figure 4: Aggregated Ranking ($\downarrow$) Average ranking of each retrieval strategy (lower is better) across different levels of data scarcity, six datasets, and three random seeds. In over 90 experimental settings, COBRA generally outperforms any baseline we test.
  • Figure 5: Results Across Target Datasets We compare various retrieval strategies over LAION-2B as $\mathcal{D}^{\text{aux}}$ using Tip-Adapter-F tipadapter to adapt CLIP to a small target dataset. We find that COBRA generally outperforms all other retrieval strategies, which is also reflected in mean-rank analysis in \ref{['fig:ranking']}. Confidence intervals are based on standard errors computed over three trials.
  • ...and 12 more figures

Theorems & Definitions (14)

  • Definition 2.1: Nearest-Neighbor Based Retrieval
  • Lemma 2.2: Graph Cut Mutual Information (GCMI)
  • proof
  • Corollary 2.3
  • Corollary 2.4
  • Remark 2.5
  • Definition 3.1: FLMI
  • Lemma 3.2: Soft Class Balancing
  • proof
  • Definition 3.3: COBRA Objective
  • ...and 4 more