Table of Contents
Fetching ...

Demonstration Augmentation for Zero-shot In-context Learning

Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang

TL;DR

DAIL introduces a memory-based demonstration augmentation framework for zero-shot in-context learning. By maintaining a memory bank of at most $M$ past (query, answer) pairs and selecting $K$ demonstrations using a combined score of semantic relevance and entropy (with optional diversity via DPP), DAIL avoids relying on model-generated demonstrations and incurs no extra inference cost. Empirical results on MMLU and BBH show that DAIL achieves state-of-the-art performance and can outperform few-shot ICL without external information, while remaining efficient for real-world deployment. The method highlights the importance of stable, cost-effective demonstrations and provides practical guidelines for memory management and scoring in deployment scenarios.

Abstract

Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates. However, many studies have highlighted that the model's performance is sensitive to the choice of demonstrations, presenting a significant challenge for practical applications where we lack prior knowledge of user queries. Consequently, we need to construct an extensive demonstration pool and incorporate external databases to assist the model, leading to considerable time and financial costs. In light of this, some recent research has shifted focus towards zero-shot ICL, aiming to reduce the model's reliance on external information by leveraging their inherent generative capabilities. Despite the effectiveness of these approaches, the content generated by the model may be unreliable, and the generation process is time-consuming. To address these issues, we propose Demonstration Augmentation for In-context Learning (DAIL), which employs the model's previously predicted historical samples as demonstrations for subsequent ones. DAIL brings no additional inference cost and does not rely on the model's generative capabilities. Our experiments reveal that DAIL can significantly improve the model's performance over direct zero-shot inference and can even outperform few-shot ICL without any external information.

Demonstration Augmentation for Zero-shot In-context Learning

TL;DR

DAIL introduces a memory-based demonstration augmentation framework for zero-shot in-context learning. By maintaining a memory bank of at most past (query, answer) pairs and selecting demonstrations using a combined score of semantic relevance and entropy (with optional diversity via DPP), DAIL avoids relying on model-generated demonstrations and incurs no extra inference cost. Empirical results on MMLU and BBH show that DAIL achieves state-of-the-art performance and can outperform few-shot ICL without external information, while remaining efficient for real-world deployment. The method highlights the importance of stable, cost-effective demonstrations and provides practical guidelines for memory management and scoring in deployment scenarios.

Abstract

Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates. However, many studies have highlighted that the model's performance is sensitive to the choice of demonstrations, presenting a significant challenge for practical applications where we lack prior knowledge of user queries. Consequently, we need to construct an extensive demonstration pool and incorporate external databases to assist the model, leading to considerable time and financial costs. In light of this, some recent research has shifted focus towards zero-shot ICL, aiming to reduce the model's reliance on external information by leveraging their inherent generative capabilities. Despite the effectiveness of these approaches, the content generated by the model may be unreliable, and the generation process is time-consuming. To address these issues, we propose Demonstration Augmentation for In-context Learning (DAIL), which employs the model's previously predicted historical samples as demonstrations for subsequent ones. DAIL brings no additional inference cost and does not rely on the model's generative capabilities. Our experiments reveal that DAIL can significantly improve the model's performance over direct zero-shot inference and can even outperform few-shot ICL without any external information.
Paper Structure (45 sections, 6 equations, 7 figures, 7 tables)

This paper contains 45 sections, 6 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: A bad case for Self-ICL, the quality of the generated samples is poor, with repeated options, false labels, and too similar semantics, which leads to the decline of the model's performance. For simplicity of the figure, we omit the generated labels of demonstrations.
  • Figure 2: Time consumption (in seconds) for different methods and sequence lengths (batch size = 16). We use LLaMA-2-7B touvron2023llama as the base model. Encode: cost of encoding n tokens. Generate: cost of generating n tokens. 3-shot: ICL with three demonstrations. For simplicity, we assume that all the demonstrations generated by the model have the same sequence length as the query.
  • Figure 3: Overview of our method. After each inference, we combine the current query with the model's output and add them to the memory bank. After step t, the sample is added to the memory bank and then used as a demonstration at step t+1.
  • Figure 4: Accuracy (%) on MMLU with different selection strategies.
  • Figure 5: Accuracy (%) on MMLU with different deletion strategies.
  • ...and 2 more figures