Table of Contents
Fetching ...

Efficient and Effective In-context Demonstration Selection with Coreset

Zihua Wang, Jiarui Wang, Haiyang Xu, Ming Yan, Fei Huang, Xu Yang, Xiu-Shen Wei, Siya Mi, Yu Zhang

TL;DR

This work tackles the NP-hard problem of selecting in-context demonstrations for multimodal LVLMs by introducing CoDR, a coreset-based dual retrieval framework. CoDR first constructs a diverse coreset via cluster-pruning to maximize the expected mutual information with the query, then leverages a precomputed C-score to guide global retrieval from the full support set. The approach yields significant improvements over traditional strategies in image captioning, visual question answering, and fine-grained image classification, while maintaining competitive runtimes. These results demonstrate a robust and scalable solution for efficient and effective demonstration selection in multimodal ICL, with potential applications to knowledge-intensive tasks in the future.

Abstract

In-context learning (ICL) has emerged as a powerful paradigm for Large Visual Language Models (LVLMs), enabling them to leverage a few examples directly from input contexts. However, the effectiveness of this approach is heavily reliant on the selection of demonstrations, a process that is NP-hard. Traditional strategies, including random, similarity-based sampling and infoscore-based sampling, often lead to inefficiencies or suboptimal performance, struggling to balance both efficiency and effectiveness in demonstration selection. In this paper, we propose a novel demonstration selection framework named Coreset-based Dual Retrieval (CoDR). We show that samples within a diverse subset achieve a higher expected mutual information. To implement this, we introduce a cluster-pruning method to construct a diverse coreset that aligns more effectively with the query while maintaining diversity. Additionally, we develop a dual retrieval mechanism that enhances the selection process by achieving global demonstration selection while preserving efficiency. Experimental results demonstrate that our method significantly improves the ICL performance compared to the existing strategies, providing a robust solution for effective and efficient demonstration selection.

Efficient and Effective In-context Demonstration Selection with Coreset

TL;DR

This work tackles the NP-hard problem of selecting in-context demonstrations for multimodal LVLMs by introducing CoDR, a coreset-based dual retrieval framework. CoDR first constructs a diverse coreset via cluster-pruning to maximize the expected mutual information with the query, then leverages a precomputed C-score to guide global retrieval from the full support set. The approach yields significant improvements over traditional strategies in image captioning, visual question answering, and fine-grained image classification, while maintaining competitive runtimes. These results demonstrate a robust and scalable solution for efficient and effective demonstration selection in multimodal ICL, with potential applications to knowledge-intensive tasks in the future.

Abstract

In-context learning (ICL) has emerged as a powerful paradigm for Large Visual Language Models (LVLMs), enabling them to leverage a few examples directly from input contexts. However, the effectiveness of this approach is heavily reliant on the selection of demonstrations, a process that is NP-hard. Traditional strategies, including random, similarity-based sampling and infoscore-based sampling, often lead to inefficiencies or suboptimal performance, struggling to balance both efficiency and effectiveness in demonstration selection. In this paper, we propose a novel demonstration selection framework named Coreset-based Dual Retrieval (CoDR). We show that samples within a diverse subset achieve a higher expected mutual information. To implement this, we introduce a cluster-pruning method to construct a diverse coreset that aligns more effectively with the query while maintaining diversity. Additionally, we develop a dual retrieval mechanism that enhances the selection process by achieving global demonstration selection while preserving efficiency. Experimental results demonstrate that our method significantly improves the ICL performance compared to the existing strategies, providing a robust solution for effective and efficient demonstration selection.

Paper Structure

This paper contains 38 sections, 7 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Similarity-based retrieval, infoscore-based retrieval and our proposed CoDR in ICL. CoDR introduces a dual retrieval mechanism: it first selects a coreset via cluster-pruning, then uses the similarity score between query and the samples in the coreset as a weighting coefficient, multiplying it with the pre-calculated C-score matrix. The weighted scores guide the final demonstration selection from the support set, achieving a more global selection.
  • Figure 2: Architecture of CoDR. Our method construct a coreset $S^*$ by clustering and pruning the support set $S$. A dual retrieval module then performs global retrieval as follows: We precompute a C-score quantifying each support set sample's quality as a demonstration when queried with coreset samples. For an input query, we evaluate its similarity to each coreset sample, hypothesizing that higher similarity correlates with more similar demonstration selection. The final demonstration score is the product of this similarity and the precomputed C-score, enabling global retrieval across the entire support set.
  • Figure 3: IC, VQA and FIC performance with different coreset scales.
  • Figure 4: Visualization of SITR and CoDR on IC.
  • Figure 5: Visualization of SITR and our method on IC, VQA and FIC tasks.