Table of Contents
Fetching ...

Large Language Models for Few-Shot Named Entity Recognition

Yufei Zhao, Xiaoshi Zhong, Erik Cambria, Jagath C. Rajapakse

TL;DR

This paper introduces GPT4NER, a prompting-based framework that enables few-shot named entity recognition by converting the task into sequence generation using LLMs. It leverages three core prompt components—entity definitions, carefully selected few-shot examples with an explicit output format, and chain-of-thought reasoning—with an optional POS cue to guide predictions. Across CoNLL2003 and OntoNotes5.0, GPT4NER outperforms representative few-shot baselines and approaches a meaningful portion of fully supervised performance, with notable gains under both strict and relaxed evaluation. The work also highlights the value of relaxed-match evaluation and reporting the NEE sub-task to better understand model behavior and limitations in real-world NER tasks.

Abstract

Named entity recognition (NER) is a fundamental task in numerous downstream applications. Recently, researchers have employed pre-trained language models (PLMs) and large language models (LLMs) to address this task. However, fully leveraging the capabilities of PLMs and LLMs with minimal human effort remains challenging. In this paper, we propose GPT4NER, a method that prompts LLMs to resolve the few-shot NER task. GPT4NER constructs effective prompts using three key components: entity definition, few-shot examples, and chain-of-thought. By prompting LLMs with these effective prompts, GPT4NER transforms few-shot NER, which is traditionally considered as a sequence-labeling problem, into a sequence-generation problem. We conduct experiments on two benchmark datasets, CoNLL2003 and OntoNotes5.0, and compare the performance of GPT4NER to representative state-of-the-art models in both few-shot and fully supervised settings. Experimental results demonstrate that GPT4NER achieves the $F_1$ of 83.15\% on CoNLL2003 and 70.37\% on OntoNotes5.0, significantly outperforming few-shot baselines by an average margin of 7 points. Compared to fully-supervised baselines, GPT4NER achieves 87.9\% of their best performance on CoNLL2003 and 76.4\% of their best performance on OntoNotes5.0. We also utilize a relaxed-match metric for evaluation and report performance in the sub-task of named entity extraction (NEE), and experiments demonstrate their usefulness to help better understand model behaviors in the NER task.

Large Language Models for Few-Shot Named Entity Recognition

TL;DR

This paper introduces GPT4NER, a prompting-based framework that enables few-shot named entity recognition by converting the task into sequence generation using LLMs. It leverages three core prompt components—entity definitions, carefully selected few-shot examples with an explicit output format, and chain-of-thought reasoning—with an optional POS cue to guide predictions. Across CoNLL2003 and OntoNotes5.0, GPT4NER outperforms representative few-shot baselines and approaches a meaningful portion of fully supervised performance, with notable gains under both strict and relaxed evaluation. The work also highlights the value of relaxed-match evaluation and reporting the NEE sub-task to better understand model behavior and limitations in real-world NER tasks.

Abstract

Named entity recognition (NER) is a fundamental task in numerous downstream applications. Recently, researchers have employed pre-trained language models (PLMs) and large language models (LLMs) to address this task. However, fully leveraging the capabilities of PLMs and LLMs with minimal human effort remains challenging. In this paper, we propose GPT4NER, a method that prompts LLMs to resolve the few-shot NER task. GPT4NER constructs effective prompts using three key components: entity definition, few-shot examples, and chain-of-thought. By prompting LLMs with these effective prompts, GPT4NER transforms few-shot NER, which is traditionally considered as a sequence-labeling problem, into a sequence-generation problem. We conduct experiments on two benchmark datasets, CoNLL2003 and OntoNotes5.0, and compare the performance of GPT4NER to representative state-of-the-art models in both few-shot and fully supervised settings. Experimental results demonstrate that GPT4NER achieves the of 83.15\% on CoNLL2003 and 70.37\% on OntoNotes5.0, significantly outperforming few-shot baselines by an average margin of 7 points. Compared to fully-supervised baselines, GPT4NER achieves 87.9\% of their best performance on CoNLL2003 and 76.4\% of their best performance on OntoNotes5.0. We also utilize a relaxed-match metric for evaluation and report performance in the sub-task of named entity extraction (NEE), and experiments demonstrate their usefulness to help better understand model behaviors in the NER task.

Paper Structure

This paper contains 21 sections, 4 equations, 2 figures, 10 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of GPT4NER for few-shot NER. The left-hand side illustrates the prompt construction using three kinds of information: (1) entity definition, (2) few-shot examples with chain-of-thought reasoning, and (3) input test text. The right-hand side depicts the procedure of LLMs processing prompts and generating entities.
  • Figure 2: An example of effective prompts for the CoNLL2003 dataset. Entity definition is in red. Few-shot examples with question-answer format and chain-of-thought reason are in blue. Test text is in dark green.