LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition
Fan Bai, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze
TL;DR
DEER tackles NER in the in-context learning setting by identifying the inefficiency of task-agnostic demonstration retrieval and proposing a training-free, label-grounded framework. It introduces two core components: label-guided retrieval, which uses token-type weighted statistics to select demonstrations that emphasize entity and context cues, and error reflection, which targets unseen, false-negative, and boundary tokens with targeted span-level prompts. The method formalizes token types and spans, defines a combined similarity for demonstration retrieval, and grounds model decisions in training-label statistics, enabling more accurate entity recognition across seen and unseen cases. Across five diverse NER datasets and four LLMs, DEER consistently outperforms strong ICL baselines and approaches or matches supervised fine-tuning in several settings, with added robustness in low-resource scenarios and scalable demonstration strategies. The work suggests that training-free ICL can approach the effectiveness of supervised methods for token-level tasks when demonstrations are guided by task-specific statistics, potentially reducing labeling costs and deployment complexity.
Abstract
In-context learning (ICL) enables large language models (LLMs) to perform new tasks using only a few demonstrations. However, in Named Entity Recognition (NER), existing ICL methods typically rely on task-agnostic semantic similarity for demonstration retrieval, which often yields less relevant examples and leads to inferior results. We introduce DEER, a training-free ICL approach that enables LLMs to make more informed entity predictions through the use of label-grounded statistics. DEER leverages token-level statistics from training labels to identify tokens most informative for entity recognition, enabling entity-focused demonstrations. It further uses these statistics to detect and refine error-prone tokens through a targeted reflection step. Evaluated on five NER datasets across four LLMs, DEER consistently outperforms existing ICL methods and achieves performance comparable to supervised fine-tuning. Further analyses demonstrate that DEER improves example retrieval, remains effective on both seen and unseen entities, and exhibits strong robustness in low-resource settings.
