Table of Contents
Fetching ...

Curriculum Demonstration Selection for In-Context Learning

Duc Anh Vu, Nguyen Tran Cong Duy, Xiaobao Wu, Hoang Minh Nhat, Du Mingzhe, Nguyen Thanh Thong, Anh Tuan Luu

TL;DR

This work tackles the challenge of selecting demonstrations for in-context learning by introducing Curriculum Demonstration Selection (CDS), which partitions training data by estimated difficulty and retrieves demonstrations from across a curriculum from easy to hard. CDS can use similarity-based or random retrieval, but encourages diverse difficulty coverage to improve learning signals for the LLM. Empirical results across math reasoning, commonsense reasoning, and code generation benchmarks show CDS consistently outperforms random and similarity-based baselines across nine LLMs, with larger gains on harder problems. The findings highlight CDS as a robust, scalable method to boost LLM reasoning and problem-solving capabilities, while also noting limitations such as fixed shot counts and reliance on predefined difficulty metadata for future exploration.

Abstract

Large Language Models (LLMs) have shown strong in-context learning (ICL) abilities with a few demonstrations. However, one critical challenge is how to select demonstrations to elicit the full potential of LLMs. In this paper, we propose Curriculum Demonstration Selection (CDS), a novel demonstration selection method for ICL. Instead of merely using similarity, CDS additionally partitions samples by their complexity measurements. Following curriculum learning, CDS then selects demonstrations from easy to difficult. Thus the selected demonstrations cover a wide range of difficulty levels, enabling LLMs to learn from varied complexities within the training set. Experiments demonstrate that our CDS consistently outperforms baseline methods, achieving notable improvements across nine LLMs on three benchmarks. Moreover, CDS proves especially effective in enhancing LLM performance in solving challenging problems.

Curriculum Demonstration Selection for In-Context Learning

TL;DR

This work tackles the challenge of selecting demonstrations for in-context learning by introducing Curriculum Demonstration Selection (CDS), which partitions training data by estimated difficulty and retrieves demonstrations from across a curriculum from easy to hard. CDS can use similarity-based or random retrieval, but encourages diverse difficulty coverage to improve learning signals for the LLM. Empirical results across math reasoning, commonsense reasoning, and code generation benchmarks show CDS consistently outperforms random and similarity-based baselines across nine LLMs, with larger gains on harder problems. The findings highlight CDS as a robust, scalable method to boost LLM reasoning and problem-solving capabilities, while also noting limitations such as fixed shot counts and reliance on predefined difficulty metadata for future exploration.

Abstract

Large Language Models (LLMs) have shown strong in-context learning (ICL) abilities with a few demonstrations. However, one critical challenge is how to select demonstrations to elicit the full potential of LLMs. In this paper, we propose Curriculum Demonstration Selection (CDS), a novel demonstration selection method for ICL. Instead of merely using similarity, CDS additionally partitions samples by their complexity measurements. Following curriculum learning, CDS then selects demonstrations from easy to difficult. Thus the selected demonstrations cover a wide range of difficulty levels, enabling LLMs to learn from varied complexities within the training set. Experiments demonstrate that our CDS consistently outperforms baseline methods, achieving notable improvements across nine LLMs on three benchmarks. Moreover, CDS proves especially effective in enhancing LLM performance in solving challenging problems.

Paper Structure

This paper contains 27 sections, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Comparison of CDS with random demonstration retrieval and with similarity retrieval on MATH benchmark across five LLMs.
  • Figure 2: Average improvement across five models in three difficulty levels.