Towards Reliable and Holistic Visual In-Context Learning Prompt Selection

Wenxiao Wu; Jing-Hao Xue; Chengming Xu; Chen Liu; Xinwei Sun; Changxin Gao; Nong Sang; Yanwei Fu

Towards Reliable and Holistic Visual In-Context Learning Prompt Selection

Wenxiao Wu, Jing-Hao Xue, Chengming Xu, Chen Liu, Xinwei Sun, Changxin Gao, Nong Sang, Yanwei Fu

TL;DR

The paper tackles the problem of selecting reliable in-context prompts for Visual In-Context Learning (VICL) by challenging the prevalent similarity-priority heuristic and addressing coverage gaps from random sampling. It introduces RH-Partial2Global, which combines a jackknife conformal-prediction-based reliable candidate selection with a covering-design-based holistic sampling to construct robust alternative sets for VICL prompts. Empirical results across segmentation, object detection, and colorization demonstrate systematic gains over Partial2Global, with negative KL divergence as the preferred conformity function and additional improvements when using test-time voting. The work also investigates the universality of the conformal-prediction strategy on VPR variants and discusses limitations related to dataset size and potential data bias, highlighting practical impact for more reliable and holistic VICL prompt selection.

Abstract

Visual In-Context Learning (VICL) has emerged as a prominent approach for adapting visual foundation models to novel tasks, by effectively exploiting contextual information embedded in in-context examples, which can be formulated as a global ranking problem of potential candidates. Current VICL methods, such as Partial2Global and VPR, are grounded in the similarity-priority assumption that images more visually similar to a query image serve as better in-context examples. This foundational assumption, while intuitive, lacks sufficient justification for its efficacy in selecting optimal in-context examples. Furthermore, Partial2Global constructs its global ranking from a series of randomly sampled pairwise preference predictions. Such a reliance on random sampling can lead to incomplete coverage and redundant samplings of comparisons, thus further adversely impacting the final global ranking. To address these issues, this paper introduces an enhanced variant of Partial2Global designed for reliable and holistic selection of in-context examples in VICL. Our proposed method, dubbed RH-Partial2Global, leverages a jackknife conformal prediction-guided strategy to construct reliable alternative sets and a covering design-based sampling approach to ensure comprehensive and uniform coverage of pairwise preferences. Extensive experiments demonstrate that RH-Partial2Global achieves excellent performance and outperforms Partial2Global across diverse visual tasks.

Towards Reliable and Holistic Visual In-Context Learning Prompt Selection

TL;DR

Abstract

Towards Reliable and Holistic Visual In-Context Learning Prompt Selection

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (1)