Table of Contents
Fetching ...

Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan

TL;DR

This work reframes in-context learning as a meta-optimization process and identifies why LLMs are sensitive to the order of demonstrations. It introduces Batch-ICL, an inference algorithm that runs N separate 1-shot passes, aggregates their meta-gradients at a chosen layer, and applies the aggregated update during a zero-shot query, yielding improved accuracy and order-agnostic behavior with lower computational cost. The authors validate Batch-ICL across multiple tasks and models, showing robust gains over standard N-shot ICL and competitive baselines, and further extend it with a multi-epoch variant that implicitly enumerates permutations for even greater improvement. Together, these contributions offer a scalable, practical approach to enhancing ICL by leveraging batch-based meta-gradient aggregation and deeper interaction across demonstrations.

Abstract

In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.

Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

TL;DR

This work reframes in-context learning as a meta-optimization process and identifies why LLMs are sensitive to the order of demonstrations. It introduces Batch-ICL, an inference algorithm that runs N separate 1-shot passes, aggregates their meta-gradients at a chosen layer, and applies the aggregated update during a zero-shot query, yielding improved accuracy and order-agnostic behavior with lower computational cost. The authors validate Batch-ICL across multiple tasks and models, showing robust gains over standard N-shot ICL and competitive baselines, and further extend it with a multi-epoch variant that implicitly enumerates permutations for even greater improvement. Together, these contributions offer a scalable, practical approach to enhancing ICL by leveraging batch-based meta-gradient aggregation and deeper interaction across demonstrations.

Abstract

In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.
Paper Structure (19 sections, 11 equations, 4 figures, 7 tables)

This paper contains 19 sections, 11 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: (a) Standard in-context learning. (b) Batch-ICL aggregates the meta-gradients generated during individual 1-shot learning forward computations and applies them to a zero-shot forward process. (c) Multi-epoch Batch-ICL further enhances ICL performance, shown here with a 2-epoch overview.
  • Figure 2: Performance dynamics across various $N$ on SST-2 and RTE.
  • Figure 3: Performance dynamics across various aggregation layer $k$.
  • Figure 4: Comparing Batch-ICL and standard ICL with various example orders, including the "Best", "Worst" and "Average" of all permutations.