Table of Contents
Fetching ...

Ambiguity-Aware In-Context Learning with Large Language Models

Lingyu Gao, Aditi Chaudhary, Krishna Srinivasan, Kazuma Hashimoto, Karthik Raman, Michael Bendersky

TL;DR

The paper tackles the sensitivity of in-context learning to prompt choice by introducing Ambig-ICL, a label-ambiguity-aware demonstration selection strategy. It combines semantic similarity with the LLM’s knowledge of the output label space, identifying an ambiguous label set and focusing on misclassified demonstrations that lie near the test example’s decision boundary. Across three fine-grained text classification tasks (SST, GoEmotions, EDOS) and two model scales of Flan-PaLM 2, Ambig-ICL consistently outperforms retriever-based and traditional baselines, with larger gains on smaller models. This approach reduces model confusion and enhances prompt utility, suggesting practical benefits for real-world ICL deployments and offering avenues to extend label-ambiguity reasoning to other NLP tasks.

Abstract

In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic similarity between the ICL demonstrations and test inputs by using a text retriever, which however is sub-optimal as that does not consider the LLM's existing knowledge about that task. From prior work (Lyu et al., 2023), we already know that labels paired with the demonstrations bias the model predictions. This leads us to our hypothesis whether considering LLM's existing knowledge about the task, especially with respect to the output label space can help in a better demonstration selection strategy. Through extensive experimentation on three text classification tasks, we find that it is beneficial to not only choose semantically similar ICL demonstrations but also to choose those demonstrations that help resolve the inherent label ambiguity surrounding the test example. Interestingly, we find that including demonstrations that the LLM previously mis-classified and also fall on the test example's decision boundary, brings the most performance gain.

Ambiguity-Aware In-Context Learning with Large Language Models

TL;DR

The paper tackles the sensitivity of in-context learning to prompt choice by introducing Ambig-ICL, a label-ambiguity-aware demonstration selection strategy. It combines semantic similarity with the LLM’s knowledge of the output label space, identifying an ambiguous label set and focusing on misclassified demonstrations that lie near the test example’s decision boundary. Across three fine-grained text classification tasks (SST, GoEmotions, EDOS) and two model scales of Flan-PaLM 2, Ambig-ICL consistently outperforms retriever-based and traditional baselines, with larger gains on smaller models. This approach reduces model confusion and enhances prompt utility, suggesting practical benefits for real-world ICL deployments and offering avenues to extend label-ambiguity reasoning to other NLP tasks.

Abstract

In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic similarity between the ICL demonstrations and test inputs by using a text retriever, which however is sub-optimal as that does not consider the LLM's existing knowledge about that task. From prior work (Lyu et al., 2023), we already know that labels paired with the demonstrations bias the model predictions. This leads us to our hypothesis whether considering LLM's existing knowledge about the task, especially with respect to the output label space can help in a better demonstration selection strategy. Through extensive experimentation on three text classification tasks, we find that it is beneficial to not only choose semantically similar ICL demonstrations but also to choose those demonstrations that help resolve the inherent label ambiguity surrounding the test example. Interestingly, we find that including demonstrations that the LLM previously mis-classified and also fall on the test example's decision boundary, brings the most performance gain.
Paper Structure (37 sections, 3 equations, 2 figures, 15 tables)

This paper contains 37 sections, 3 equations, 2 figures, 15 tables.

Figures (2)

  • Figure 1: Overview of our proposed method for selecting ICL demonstrations: For each test example, we first use a retriever to rank training data by semantic similarity. At the same time, we identify the ambiguous label set for each test example and also obtain the output predictions on the retrieved training data. Next, we apply three constraints on the top-ranked demonstrations which are: 1) select those demonstrations whose gold label is in the ambiguous label set, 2) select those which are also mis-classified by the model, and 3) select those mis-classified examples whose predicted label is in the ambiguous label set. Finally, we construct prompts with selected ICL demonstrations to get the final model predictions.
  • Figure 2: Confusion Matrix of zero-shot experiments on SST with Flan-PaLM 2 (L). Labels: VPos (Very Positive), Pos (Positive), Neu (Neutral), Neg (Negative), VNeg (Very Negative).