Iterative Forward Tuning Boosts In-Context Learning in Language Models
Jiaxi Yang, Binyuan Hui, Min Yang, Bailin Wang, Bowen Li, Binhua Li, Fei Huang, Yongbin Li
TL;DR
This work addresses the sensitivity of in-context learning to demonstrations by introducing a two-stage framework that decouples demonstration processing from test-time inference. In the Deep-Thinking stage, it employs Iterative Enhanced Attention to accumulate information across multiple forward passes by updating Key-Value caches with a gating mechanism $\widetilde{K}_{t}^{l} = \eta K_{t}^{l} + (1 - \eta) \widetilde{K}_{t-1}^{l}$ and $\widetilde{V}_{t}^{l} = \eta V_{t}^{l} + (1 - \eta) \widetilde{V}_{t-1}^{l}$, all without updating model parameters; the Test stage then uses these refined memories for inference. Empirically, Deep-Thinking yields consistent improvements over vanilla ICL across conventional benchmarks (SST-2/5, MR, AGNews, TREC) and challenging datasets (MMLU, BBH), across multiple model families (OPT, GPT-2/Neo, LLaMA2, Pythia), with robustness to demonstration quantity, seed, and order. The approach is particularly beneficial when demonstration pools are limited or impractical, suggesting a practical path to enhancing ICL without additional training or prompt engineering. Overall, the method demonstrates that iterative, memory-augmented reasoning can significantly boost in-context learning performance and reliability.
Abstract
Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations involving demonstrations, a practice aligning more closely with the iterative decision-making process exhibited by humans, who often learn through analogy. In this study, we introduce a novel two-stage framework to boost ICL in LLMs. Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation. This mechanism operates by manipulating the Key-Value matrices without training, fostering enhanced understanding capabilities in LLMs by thinking demonstrations multiple times. We evaluated Deep-Thinking across a range of benchmarks and LLMs, showing its superior performance over vanilla ICL methods and its effectiveness in challenging tasks where demonstration selection is infeasible.
