Table of Contents
Fetching ...

LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition

Fan Bai, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze

TL;DR

DEER tackles NER in the in-context learning setting by identifying the inefficiency of task-agnostic demonstration retrieval and proposing a training-free, label-grounded framework. It introduces two core components: label-guided retrieval, which uses token-type weighted statistics to select demonstrations that emphasize entity and context cues, and error reflection, which targets unseen, false-negative, and boundary tokens with targeted span-level prompts. The method formalizes token types and spans, defines a combined similarity for demonstration retrieval, and grounds model decisions in training-label statistics, enabling more accurate entity recognition across seen and unseen cases. Across five diverse NER datasets and four LLMs, DEER consistently outperforms strong ICL baselines and approaches or matches supervised fine-tuning in several settings, with added robustness in low-resource scenarios and scalable demonstration strategies. The work suggests that training-free ICL can approach the effectiveness of supervised methods for token-level tasks when demonstrations are guided by task-specific statistics, potentially reducing labeling costs and deployment complexity.

Abstract

In-context learning (ICL) enables large language models (LLMs) to perform new tasks using only a few demonstrations. However, in Named Entity Recognition (NER), existing ICL methods typically rely on task-agnostic semantic similarity for demonstration retrieval, which often yields less relevant examples and leads to inferior results. We introduce DEER, a training-free ICL approach that enables LLMs to make more informed entity predictions through the use of label-grounded statistics. DEER leverages token-level statistics from training labels to identify tokens most informative for entity recognition, enabling entity-focused demonstrations. It further uses these statistics to detect and refine error-prone tokens through a targeted reflection step. Evaluated on five NER datasets across four LLMs, DEER consistently outperforms existing ICL methods and achieves performance comparable to supervised fine-tuning. Further analyses demonstrate that DEER improves example retrieval, remains effective on both seen and unseen entities, and exhibits strong robustness in low-resource settings.

LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition

TL;DR

DEER tackles NER in the in-context learning setting by identifying the inefficiency of task-agnostic demonstration retrieval and proposing a training-free, label-grounded framework. It introduces two core components: label-guided retrieval, which uses token-type weighted statistics to select demonstrations that emphasize entity and context cues, and error reflection, which targets unseen, false-negative, and boundary tokens with targeted span-level prompts. The method formalizes token types and spans, defines a combined similarity for demonstration retrieval, and grounds model decisions in training-label statistics, enabling more accurate entity recognition across seen and unseen cases. Across five diverse NER datasets and four LLMs, DEER consistently outperforms strong ICL baselines and approaches or matches supervised fine-tuning in several settings, with added robustness in low-resource scenarios and scalable demonstration strategies. The work suggests that training-free ICL can approach the effectiveness of supervised methods for token-level tasks when demonstrations are guided by task-specific statistics, potentially reducing labeling costs and deployment complexity.

Abstract

In-context learning (ICL) enables large language models (LLMs) to perform new tasks using only a few demonstrations. However, in Named Entity Recognition (NER), existing ICL methods typically rely on task-agnostic semantic similarity for demonstration retrieval, which often yields less relevant examples and leads to inferior results. We introduce DEER, a training-free ICL approach that enables LLMs to make more informed entity predictions through the use of label-grounded statistics. DEER leverages token-level statistics from training labels to identify tokens most informative for entity recognition, enabling entity-focused demonstrations. It further uses these statistics to detect and refine error-prone tokens through a targeted reflection step. Evaluated on five NER datasets across four LLMs, DEER consistently outperforms existing ICL methods and achieves performance comparable to supervised fine-tuning. Further analyses demonstrate that DEER improves example retrieval, remains effective on both seen and unseen entities, and exhibits strong robustness in low-resource settings.

Paper Structure

This paper contains 37 sections, 5 equations, 4 figures, 21 tables.

Figures (4)

  • Figure 1: Existing ICL methods for NER typically select demonstrations based on task-agnostic input embeddings, which we argue yield suboptimal demonstrations for LLMs and consequently inferior performance. In contrast, our proposed method, Deer, leverages label-grounded statistics to identify entity-relevant tokens, enabling more task-focused demonstration selection and targeted error reflection.
  • Figure 2: Overview of Deer. In the preparation stage (Step 0), the method compiles training input and labels to compute token frequencies and probabilities in three scenarios: 1) entity token, 2) context token, and 3) other token, along with their associated spans. In the inference stage (Step 1 and 2), Step 1 retrieves sentence-level demonstrations by emphasizing potential entity- and context-related tokens based on probabilities from Step 0. Step 2 refines predictions from Step 1 by addressing error-prone tokens based on label statistics, focusing on three token types: unseen tokens, "false negative" tokens, and boundary tokens. For each token type, the refinement process retrieves span-level demonstrations and prompts LLMs to adjust predictions. See §\ref{['sec:method']} for further details.
  • Figure 3: Comparison of different context lengths in Deer across three datasets, where 0 indicates that tokens are classified as either entity or other tokens. The substantial performance gain from context length 1 over 0 highlights the effectiveness of incorporating context tokens in our approach.
  • Figure 4: NER error ontology. This ontology is developed based on the standard strict mention-level matching metric for NER segura-bedmar-etal-2013-semeval, where a predicted entity is considered correct only if both the predicted span boundaries and the entity type match with the gold ones. "Span Errors" refer to predictions that include at least span errors, while "Type Errors" denote cases where only the entity type is incorrect. "Multi-Span Errors" describe cases where multiple span issues occur, such as when a single gold entity is split into two predicted entities or when two gold entities are merged into one predicted entity. Similar analyses can be found in recent works ding-etal-2024-rethinkinglu2024largelanguagemodelsstruggle.