Table of Contents
Fetching ...

Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations

Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue

TL;DR

This work tackles the instability of In-Context Learning (ICL) arising from demonstration selection by introducing a unified affinity–diversity metric derived from internal model representations. By identifying the most influential induction head and operating in its $W_Q^{\hat{h},\top} W_K^{\hat{h}}$ subspace, the authors define affinity as the mean cosine similarity between query and label representations and diversity as the label representations' covariance-based variance. Empirically, affinity correlates with accuracy while diversity yields strong explanatory power ($R^2$) across tasks, and both metrics align with, yet unify, prior demonstration-selection methods that are often inconsistent. The proposed framework clarifies why existing methods diverge and suggests a joint selection criterion that could improve ICL performance in practical settings.

Abstract

The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.

Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations

TL;DR

This work tackles the instability of In-Context Learning (ICL) arising from demonstration selection by introducing a unified affinity–diversity metric derived from internal model representations. By identifying the most influential induction head and operating in its subspace, the authors define affinity as the mean cosine similarity between query and label representations and diversity as the label representations' covariance-based variance. Empirically, affinity correlates with accuracy while diversity yields strong explanatory power () across tasks, and both metrics align with, yet unify, prior demonstration-selection methods that are often inconsistent. The proposed framework clarifies why existing methods diverge and suggests a joint selection criterion that could improve ICL performance in practical settings.

Abstract

The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.

Paper Structure

This paper contains 33 sections, 3 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: The Affinity and Diversity of the demonstrations in TREC, SST5, TEE on $k=16$. The colors of the circles refer to the accuracy of the classification tasks. The line and background color refer to the decision boundary to predict labels by affinity and diversity. The larger the affinity and diversity, the higher the accuracy tends to be.
  • Figure 2: The correlation coefficient between affinity and accuracy
  • Figure 3: The coefficient of determination between diversity and accuracy
  • Figure 4: Left: The tendency of diversity to accuracy on $k=16$. Right: The tendency of affinity to accuracy on $k=16$.
  • Figure 5: Left: The Spearman's rank correlation coefficient of the similarity scores, affinity, diversity, and accuracy of $k=16$ on SST2. Middle: The affinity of selected demonstrations by each selection method on $k=2$. Right: The diversity of selected demonstrations by each selection method on $k=2$.
  • ...and 5 more figures