Table of Contents
Fetching ...

Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations

Zhengyao Gu, Henry Peng Zou, Yankai Chen, Aiwei Liu, Weizhi Zhang, Philip S. Yu

TL;DR

This work investigates scaling in in-context learning (ICL) when using self-generated annotations, proposing a three-step Semi-Supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. It introduces Naive-SemiICL, a simple single-iteration baseline that consistently outperforms standard ICL in zero-, few-, and many-shot regimes and reveals a scaling law with optimal performance after around 1,000 demonstrations. Building on this, IterPSD iteratively refines pseudo-demonstrations via curriculum pseudo-labeling and confirmation-bias mitigation, delivering up to 6.8% additional gains on classification tasks. Across 16 datasets spanning classification, translation, and reasoning, the approach demonstrates strong performance under low-resource conditions and highlights the practical potential of pseudo-demonstrations for scalable, cost-efficient ICL.

Abstract

The high cost of obtaining high-quality annotated data for in-context learning (ICL) has motivated the development of methods that use self-generated annotations in place of ground-truth labels. While these approaches have shown promising results in few-shot settings, they generally do not scale to many-shot scenarios. In this work, we study ICL with self-generated examples using a framework analogous to traditional semi-supervised learning, consisting of annotation generation, demonstration selection, and in-context inference. Within this framework, we propose a simple baseline that outperforms ground-truth ICL in zero-shot, few-shot, and many-shot settings. Notably, we observe a scaling law with this baseline, where optimal performance is achieved with more than 1,000 demonstrations. To fully exploit the many-shot capabilities of semi-supervised ICL, we introduce IterPSD, an iterative annotation approach that integrates iterative refinement and curriculum pseudo-labeling techniques from semi-supervised learning, yielding up to 6.8% additional gains on classification tasks.

Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations

TL;DR

This work investigates scaling in in-context learning (ICL) when using self-generated annotations, proposing a three-step Semi-Supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. It introduces Naive-SemiICL, a simple single-iteration baseline that consistently outperforms standard ICL in zero-, few-, and many-shot regimes and reveals a scaling law with optimal performance after around 1,000 demonstrations. Building on this, IterPSD iteratively refines pseudo-demonstrations via curriculum pseudo-labeling and confirmation-bias mitigation, delivering up to 6.8% additional gains on classification tasks. Across 16 datasets spanning classification, translation, and reasoning, the approach demonstrates strong performance under low-resource conditions and highlights the practical potential of pseudo-demonstrations for scalable, cost-efficient ICL.

Abstract

The high cost of obtaining high-quality annotated data for in-context learning (ICL) has motivated the development of methods that use self-generated annotations in place of ground-truth labels. While these approaches have shown promising results in few-shot settings, they generally do not scale to many-shot scenarios. In this work, we study ICL with self-generated examples using a framework analogous to traditional semi-supervised learning, consisting of annotation generation, demonstration selection, and in-context inference. Within this framework, we propose a simple baseline that outperforms ground-truth ICL in zero-shot, few-shot, and many-shot settings. Notably, we observe a scaling law with this baseline, where optimal performance is achieved with more than 1,000 demonstrations. To fully exploit the many-shot capabilities of semi-supervised ICL, we introduce IterPSD, an iterative annotation approach that integrates iterative refinement and curriculum pseudo-labeling techniques from semi-supervised learning, yielding up to 6.8% additional gains on classification tasks.

Paper Structure

This paper contains 38 sections, 7 equations, 6 figures, 7 tables, 3 algorithms.

Figures (6)

  • Figure 1: Semi-supervised ICL Framework. Ground truth data are used as demonstration for generating pseudo-demonstrations from unannotated data. The generated pseudo-demonstrations conjunctively with a small ground truth demonstration, are selectively used as demonstrations for the final prompting.
  • Figure 2: Scaling trend of Naive-SemiICL on classification and translation tasks with GPT-4o and GPT-4o-mini. The dashed gray line represents the few-shot baseline. Both model exhibits a scaling trend on most tasks. All experiments are performed with a ground truth budget of $k_l = 16$.
  • Figure 3: Comparison of GPT-4o-mini (top) and GPT-4o (bottom) performance across multiple datasets using three different methods: Few-Shot, Naive-SemiICL (Naive), and Naive-SemiICL without filtering (Unfiltered).
  • Figure 4: Scaling trend of IterPSD on five benchmark tasks. Blue horizontal dashed line represents the best performing Naive-SemiICL on the same dataset.
  • Figure 5: Many-shot scaling performance of GPT-4o-mini (top) and GPT-4o (bottom) across six selected datasets. The x-axis represents the number of shots (log scale), and the y-axis represents performance. The solid blue lines indicate many-shot in-context learning (ICL), while the dashed vertical lines mark the peak performance of Naive-SemiICL. Both models scale beyond the peak the performance of pseudo-demonstration approach.
  • ...and 1 more figures