Table of Contents
Fetching ...

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning

Dong Shu, Mengnan Du

TL;DR

Six proposed demonstration selection algorithms are revisited, evaluating them on five datasets from both efficiency and effectiveness perspectives, and finding that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency.

Abstract

In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training. However, this performance heavily depends on the quality of the demonstrations, driving research into effective demonstration selection algorithms to optimize this process. These algorithms assist users in selecting the best $k$ input-label pairs (demonstration examples) based on a given test input, enabling LLMs to in-context learn the relationship between the provided examples and the test inputs. Despite all the proposed demonstration selection algorithms, their efficiency and effectiveness remain unclear. This lack of clarity make it difficult to apply these algorithms in real-world scenarios and poses challenges for future research aimed at developing improved methods. This paper revisits six proposed algorithms, evaluating them on five datasets from both efficiency and effectiveness perspectives. Our experiments reveal significant variations in algorithm performance across different tasks, with some methods struggling to outperform random selection in certain scenarios. We also find that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency. Our code is available at https://github.com/Tizzzzy/Demonstration_Selection_Overview.

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning

TL;DR

Six proposed demonstration selection algorithms are revisited, evaluating them on five datasets from both efficiency and effectiveness perspectives, and finding that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency.

Abstract

In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training. However, this performance heavily depends on the quality of the demonstrations, driving research into effective demonstration selection algorithms to optimize this process. These algorithms assist users in selecting the best input-label pairs (demonstration examples) based on a given test input, enabling LLMs to in-context learn the relationship between the provided examples and the test inputs. Despite all the proposed demonstration selection algorithms, their efficiency and effectiveness remain unclear. This lack of clarity make it difficult to apply these algorithms in real-world scenarios and poses challenges for future research aimed at developing improved methods. This paper revisits six proposed algorithms, evaluating them on five datasets from both efficiency and effectiveness perspectives. Our experiments reveal significant variations in algorithm performance across different tasks, with some methods struggling to outperform random selection in certain scenarios. We also find that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency. Our code is available at https://github.com/Tizzzzy/Demonstration_Selection_Overview.

Paper Structure

This paper contains 12 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An overview of demonstration selection algorithms: These algorithms select demonstrations from the data pool, which the LLMs then use to generate answers.
  • Figure 2: An visual understanding the difference between direct and channel approach
  • Figure 3: The effectiveness of the algorithms on MRPC dataset
  • Figure 4: An visualize trend of the algorithms effectiveness