Table of Contents
Fetching ...

Iterative Forward Tuning Boosts In-Context Learning in Language Models

Jiaxi Yang, Binyuan Hui, Min Yang, Bailin Wang, Bowen Li, Binhua Li, Fei Huang, Yongbin Li

TL;DR

This work addresses the sensitivity of in-context learning to demonstrations by introducing a two-stage framework that decouples demonstration processing from test-time inference. In the Deep-Thinking stage, it employs Iterative Enhanced Attention to accumulate information across multiple forward passes by updating Key-Value caches with a gating mechanism $\widetilde{K}_{t}^{l} = \eta K_{t}^{l} + (1 - \eta) \widetilde{K}_{t-1}^{l}$ and $\widetilde{V}_{t}^{l} = \eta V_{t}^{l} + (1 - \eta) \widetilde{V}_{t-1}^{l}$, all without updating model parameters; the Test stage then uses these refined memories for inference. Empirically, Deep-Thinking yields consistent improvements over vanilla ICL across conventional benchmarks (SST-2/5, MR, AGNews, TREC) and challenging datasets (MMLU, BBH), across multiple model families (OPT, GPT-2/Neo, LLaMA2, Pythia), with robustness to demonstration quantity, seed, and order. The approach is particularly beneficial when demonstration pools are limited or impractical, suggesting a practical path to enhancing ICL without additional training or prompt engineering. Overall, the method demonstrates that iterative, memory-augmented reasoning can significantly boost in-context learning performance and reliability.

Abstract

Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations involving demonstrations, a practice aligning more closely with the iterative decision-making process exhibited by humans, who often learn through analogy. In this study, we introduce a novel two-stage framework to boost ICL in LLMs. Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation. This mechanism operates by manipulating the Key-Value matrices without training, fostering enhanced understanding capabilities in LLMs by thinking demonstrations multiple times. We evaluated Deep-Thinking across a range of benchmarks and LLMs, showing its superior performance over vanilla ICL methods and its effectiveness in challenging tasks where demonstration selection is infeasible.

Iterative Forward Tuning Boosts In-Context Learning in Language Models

TL;DR

This work addresses the sensitivity of in-context learning to demonstrations by introducing a two-stage framework that decouples demonstration processing from test-time inference. In the Deep-Thinking stage, it employs Iterative Enhanced Attention to accumulate information across multiple forward passes by updating Key-Value caches with a gating mechanism and , all without updating model parameters; the Test stage then uses these refined memories for inference. Empirically, Deep-Thinking yields consistent improvements over vanilla ICL across conventional benchmarks (SST-2/5, MR, AGNews, TREC) and challenging datasets (MMLU, BBH), across multiple model families (OPT, GPT-2/Neo, LLaMA2, Pythia), with robustness to demonstration quantity, seed, and order. The approach is particularly beneficial when demonstration pools are limited or impractical, suggesting a practical path to enhancing ICL without additional training or prompt engineering. Overall, the method demonstrates that iterative, memory-augmented reasoning can significantly boost in-context learning performance and reliability.

Abstract

Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations involving demonstrations, a practice aligning more closely with the iterative decision-making process exhibited by humans, who often learn through analogy. In this study, we introduce a novel two-stage framework to boost ICL in LLMs. Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation. This mechanism operates by manipulating the Key-Value matrices without training, fostering enhanced understanding capabilities in LLMs by thinking demonstrations multiple times. We evaluated Deep-Thinking across a range of benchmarks and LLMs, showing its superior performance over vanilla ICL methods and its effectiveness in challenging tasks where demonstration selection is infeasible.
Paper Structure (28 sections, 6 equations, 6 figures, 6 tables)

This paper contains 28 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The illustrations of vanilla ICL and our proposed two-stage framework through Deep-Thinking. The vanilla ICL method processes demonstrations only once, while our " Deep-Thinking" method enables multiple rounds of information accumulation during the reasoning process.
  • Figure 2: The overview of proposed two-stage ICL framework. It divides the ICL process into Deep-Thinking stage and test stage, which take demonstrations and test query as input, respectively. It replaces the vanilla self-attention mechanism with the proposed Iterative Enhanced Attention (IEA). IEA utilizes the Key-Value matrices as bridge of memories, capable of receiving historical (from the previous iteration) memories. It can mix memories with present information to perform attention, and update memories for the next iteration. During testing, predictions are performed using memories that have been refined through multiple iterations. Notably, throughout this process, the LLM parameters remain frozen and no additional parameters are introduced.
  • Figure 3: Comparison of model performance across four major classes of the MMLU benchmarks. Due to space constraints and to ensure clarity in presentation, we solely report the results of four out of the seven models.
  • Figure 4: An illustration of the impact of increasing the number of demonstrations on the effectiveness of vanilla ICL and Deep-Thinking.
  • Figure 5: The performance distribution of performance for vanilla ICL and Deep-Thinking, comparing effects of random seeds (left) and random orders (right).
  • ...and 1 more figures