Table of Contents
Fetching ...

Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context Learning

Zhu Zixiao, Feng Zijian, Zhou Hanzhang, Qian Junlang, Mao Kezhi

TL;DR

This work proposes LICL, a logit separability-based method that jointly organizes samples and integrates multiple class-related words into each sample-label pair, and shows that this approach significantly improves ICL performance by providing clearer instructions and richer label information.

Abstract

Effective organization of in-context learning (ICL) demonstrations is key to improving the quality of large language model (LLM) responses. To create better sample-label pairs that instruct LLM understanding, we introduce logit separability, a criterion to assess the clarity of both samples and class-related words at the logit level. This facilitates the optimization of sample and label selection, enhancing the precision of information provided in ICL demonstrations. Additionally, we find that incorporating multiple class-related words for each sample, rather than relying on a single class name, improves performance by offering a broader range of label information. Building on these insights, we propose LICL, a logit separability-based method that jointly organizes samples and integrates multiple class-related words into each sample-label pair. Evaluations across seven classification datasets show that this approach significantly improves ICL performance by providing clearer instructions and richer label information.

Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context Learning

TL;DR

This work proposes LICL, a logit separability-based method that jointly organizes samples and integrates multiple class-related words into each sample-label pair, and shows that this approach significantly improves ICL performance by providing clearer instructions and richer label information.

Abstract

Effective organization of in-context learning (ICL) demonstrations is key to improving the quality of large language model (LLM) responses. To create better sample-label pairs that instruct LLM understanding, we introduce logit separability, a criterion to assess the clarity of both samples and class-related words at the logit level. This facilitates the optimization of sample and label selection, enhancing the precision of information provided in ICL demonstrations. Additionally, we find that incorporating multiple class-related words for each sample, rather than relying on a single class name, improves performance by offering a broader range of label information. Building on these insights, we propose LICL, a logit separability-based method that jointly organizes samples and integrates multiple class-related words into each sample-label pair. Evaluations across seven classification datasets show that this approach significantly improves ICL performance by providing clearer instructions and richer label information.
Paper Structure (38 sections, 4 equations, 10 figures, 21 tables)

This paper contains 38 sections, 4 equations, 10 figures, 21 tables.

Figures (10)

  • Figure 1: Exploration of Samples and class-related words on LLaMA2-7b. (a): Logit separability of SST-2 samples across class-related words under zero-shot learning (ZSL), showing varying degrees of separation due to different input samples. The 1-shot accuracy is demonstrated using the good or bad negative/positive samples, with class names as labels. (b): Logit values of various class-related words for 100 negative and 100 positive SST-2 samples under ZSL, showcasing the logit separability of class-related words across samples. (c): Accuracy comparison in 1-shot ICL using class names, single class-related words, and multiple class-related words (combining the two sets with spaces) as labels. Performance with multiple class-related words surpassed the other two sets. More experiments and analyses, including those on GPT2-xl, are in Appx.\ref{['appendixa']}.
  • Figure 2: Overall architecture of LICL: The top part shows the pool refinement, with sample organization based on logit separability across the refined pool. The bottom part presents multiple class-related word insertion via sequential forward search, starting from an initial sample-label pair to form a sample-multiple-label pair.
  • Figure 3: Validation and test performance under inserted word quantity (N) in sample-multiple-label pairs. The red cross marks the reported result setting. In LLaMA2-7b, N is 2 for SST2, IMDB, TREC, ISEAR, and AGNews, 4 for CR, and 5 for AMAN. In GPT2-xl, N is 7 for ISEAR and 2 for others. The remaining datasets are in Appx.\ref{['appendixd']}.
  • Figure 4: class-related words logit separability over samples.
  • Figure 5: Label effectiveness in ICL (GPT2-xl)
  • ...and 5 more figures