Table of Contents
Fetching ...

PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection

Sepideh Mamooler, Syrielle Montariol, Alexander Mathis, Antoine Bosselut

TL;DR

This paper tackles the challenge of in-context learning for low-resource named entity detection by analyzing how demonstrations influence transfer, revealing that partially correct annotations can be as effective as fully correct ones if a sufficient number of entities are present. It then introduces PICLe, a no-human-annotation framework that generates pseudo-annotated demonstrations via zero-shot LLM predictions, refines them with self-verification, clusters the pseudo-labeled data, and uses cluster-specific demonstrations with self-verification to arrive at final entity mentions. Across five biomedical NED datasets, PICLe outperforms zero-shot ICL and rivals or surpasses gold-demo ICL in scarce data settings, while reducing annotation costs; ablations highlight the importance of self-verification and cluster-based demonstration sampling. The work provides practical implications for applying ICL in domains where expert labeling is expensive or unavailable, though it notes limitations in scope (single task), domain specificity (biomedical abstracts), and potential annotation bias from LLM-based pseudo-labels.

Abstract

In-context learning (ICL) enables Large Language Models (LLMs) to perform tasks using few demonstrations, facilitating task adaptation when labeled examples are hard to obtain. However, ICL is sensitive to the choice of demonstrations, and it remains unclear which demonstration attributes enable in-context generalization. In this work, we conduct a perturbation study of in-context demonstrations for low-resource Named Entity Detection (NED). Our surprising finding is that in-context demonstrations with partially correct annotated entity mentions can be as effective for task transfer as fully correct demonstrations. Based off our findings, we propose Pseudo-annotated In-Context Learning (PICLe), a framework for in-context learning with noisy, pseudo-annotated demonstrations. PICLe leverages LLMs to annotate many demonstrations in a zero-shot first pass. We then cluster these synthetic demonstrations, sample specific sets of in-context demonstrations from each cluster, and predict entity mentions using each set independently. Finally, we use self-verification to select the final set of entity mentions. We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings where limited gold examples can be used as in-context demonstrations.

PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection

TL;DR

This paper tackles the challenge of in-context learning for low-resource named entity detection by analyzing how demonstrations influence transfer, revealing that partially correct annotations can be as effective as fully correct ones if a sufficient number of entities are present. It then introduces PICLe, a no-human-annotation framework that generates pseudo-annotated demonstrations via zero-shot LLM predictions, refines them with self-verification, clusters the pseudo-labeled data, and uses cluster-specific demonstrations with self-verification to arrive at final entity mentions. Across five biomedical NED datasets, PICLe outperforms zero-shot ICL and rivals or surpasses gold-demo ICL in scarce data settings, while reducing annotation costs; ablations highlight the importance of self-verification and cluster-based demonstration sampling. The work provides practical implications for applying ICL in domains where expert labeling is expensive or unavailable, though it notes limitations in scope (single task), domain specificity (biomedical abstracts), and potential annotation bias from LLM-based pseudo-labels.

Abstract

In-context learning (ICL) enables Large Language Models (LLMs) to perform tasks using few demonstrations, facilitating task adaptation when labeled examples are hard to obtain. However, ICL is sensitive to the choice of demonstrations, and it remains unclear which demonstration attributes enable in-context generalization. In this work, we conduct a perturbation study of in-context demonstrations for low-resource Named Entity Detection (NED). Our surprising finding is that in-context demonstrations with partially correct annotated entity mentions can be as effective for task transfer as fully correct demonstrations. Based off our findings, we propose Pseudo-annotated In-Context Learning (PICLe), a framework for in-context learning with noisy, pseudo-annotated demonstrations. PICLe leverages LLMs to annotate many demonstrations in a zero-shot first pass. We then cluster these synthetic demonstrations, sample specific sets of in-context demonstrations from each cluster, and predict entity mentions using each set independently. Finally, we use self-verification to select the final set of entity mentions. We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings where limited gold examples can be used as in-context demonstrations.

Paper Structure

This paper contains 39 sections, 10 figures, 9 tables.

Figures (10)

  • Figure 1: 10-shot ICL performance using various demonstration corruption schemes, with Mistral and $k$NN demonstration retrieval. We compare to zero-shot and 10-shot with gold demonstrations, averaging over all datasets.
  • Figure 2: 10-shot ICL performance with perturbed demonstrations with different perturbation schemes and using Mistral and $k$NN demonstration retrieval. We report the prediction F1 as a function of the precision, recall, and F1 of the perturbed demonstration label sets (relative to the gold demonstrations) averaged over all datasets. The size of the points shows the average number of entities in the label sets of the perturbed demonstrations.
  • Figure 3: PICLe pipeline. Unlabeled samples are pseudo-annotated through a zero-shot prediction and self-verification pass. Subsequently, they are clustered, and cluster-specific sets of ICL demonstrations are chosen at random from each group. Each set is independently used to find entity mentions in the query, and the final set of entity mentions is obtained by aggregating these independent sets and asking the model to verify the type of each predicted entity.
  • Figure 4: Performance of PICLe, zero-shot, and 10-shot ICL with gold demonstrations selected from $10, 50, 100$ gold examples using Mistral. The error bars show the variance across $5$ seeds for sampling subsets of gold examples. All methods are followed by self-verification.
  • Figure 5: 10-shot ICL performance using various demonstration corruption schemes, compared with zero-shot and ICL with gold annotations, for each dataset. Experiments performed using Mistral (top) and GPT-3.5-Turbo (bottom) and $k$NN demonstration retrieval. (Best viewed in color.)
  • ...and 5 more figures