Table of Contents
Fetching ...

ParaICL: Towards Parallel In-Context Learning

Xingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing

TL;DR

The paper tackles the sensitivity of few-shot in-context learning to demonstration selection and input context length. It introduces ParaICL, which partitions demonstrations into parallel batches by semantic similarity to the test question, computes normalized batch semantic scores, and optimizes a weighted average semantic objective $\mathcal{L}_{\mathrm{WAS}}$ under an adaptive plausibility constraint $\mathcal{V}_{head}$ to pick tokens. The proposed method leverages all available demonstrations without expanding per-batch context and demonstrates consistent gains across reasoning, NLI, and coding tasks, with compatibility for both open- and closed-source LLMs and potential integration with other techniques like contrastive decoding. The findings indicate ParaICL as a modular, practical enhancement to ICL that improves robustness to demonstration choice and scales with model capabilities, offering a path toward more reliable, retrieval-free augmentation of LLM behavior.

Abstract

Large language models (LLMs) have become the norm in natural language processing (NLP), excelling in few-shot in-context learning (ICL) with their remarkable abilities. Nonetheless, the success of ICL largely hinges on the choice of few-shot demonstration examples, making the selection process increasingly crucial. Existing methods have delved into optimizing the quantity and semantic similarity of these examples to improve ICL performances. However, our preliminary experiments indicate that the effectiveness of ICL is limited by the length of the input context. Moreover, varying combinations of few-shot demonstration examples can significantly boost accuracy across different test samples. To address this, we propose a novel method named parallel in-context learning (ParaICL) that effectively utilizes all demonstration examples without exceeding the manageable input context length. ParaICL employs parallel batching to distribute demonstration examples into different batches according to the semantic similarities of the questions in the demonstrations to the test question. It then computes normalized batch semantic scores for each batch. A weighted average semantic objective, constrained by adaptive plausibility, is applied to select the most appropriate tokens. Through extensive experiments, we validate the effectiveness of ParaICL and conduct ablation studies to underscore its design rationale. We further demonstrate that ParaICL can seamlessly integrate with existing methods.

ParaICL: Towards Parallel In-Context Learning

TL;DR

The paper tackles the sensitivity of few-shot in-context learning to demonstration selection and input context length. It introduces ParaICL, which partitions demonstrations into parallel batches by semantic similarity to the test question, computes normalized batch semantic scores, and optimizes a weighted average semantic objective under an adaptive plausibility constraint to pick tokens. The proposed method leverages all available demonstrations without expanding per-batch context and demonstrates consistent gains across reasoning, NLI, and coding tasks, with compatibility for both open- and closed-source LLMs and potential integration with other techniques like contrastive decoding. The findings indicate ParaICL as a modular, practical enhancement to ICL that improves robustness to demonstration choice and scales with model capabilities, offering a path toward more reliable, retrieval-free augmentation of LLM behavior.

Abstract

Large language models (LLMs) have become the norm in natural language processing (NLP), excelling in few-shot in-context learning (ICL) with their remarkable abilities. Nonetheless, the success of ICL largely hinges on the choice of few-shot demonstration examples, making the selection process increasingly crucial. Existing methods have delved into optimizing the quantity and semantic similarity of these examples to improve ICL performances. However, our preliminary experiments indicate that the effectiveness of ICL is limited by the length of the input context. Moreover, varying combinations of few-shot demonstration examples can significantly boost accuracy across different test samples. To address this, we propose a novel method named parallel in-context learning (ParaICL) that effectively utilizes all demonstration examples without exceeding the manageable input context length. ParaICL employs parallel batching to distribute demonstration examples into different batches according to the semantic similarities of the questions in the demonstrations to the test question. It then computes normalized batch semantic scores for each batch. A weighted average semantic objective, constrained by adaptive plausibility, is applied to select the most appropriate tokens. Through extensive experiments, we validate the effectiveness of ParaICL and conduct ablation studies to underscore its design rationale. We further demonstrate that ParaICL can seamlessly integrate with existing methods.
Paper Structure (34 sections, 9 equations, 4 figures, 7 tables)

This paper contains 34 sections, 9 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Results of Mistral-7B-Instruct-v0.2 on 100 test samples from GSM8K and WinoGrande using different numbers of few-shot demonstration examples. Increasing the number of demonstration examples does not necessarily improve the performance consistently.
  • Figure 2: Results of Llama-2-7B-Chat on 100 WinoGrande test samples using different combinations of 10-shot demonstration examples. Different combinations improve the model's accuracy on various test samples.
  • Figure 3: Our proposed parallel in-context learning (ParaICL) method. Colored squares with black borders denote demonstration samples. Squares filled in grey with matching borders denote test sample $\hat{x}_i$.
  • Figure 4: Results of Mistral-7B-Instruct-v0.2 on GSM8K using different batches of five-shot demonstration examples.