CodeNER: Code Prompting for Named Entity Recognition
Sungwoo Han, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura
TL;DR
This work tackles zero-shot and few-shot NER with large language models by bridging the gap between text-in-text-out prompting and the text-in-span-out BIO labeling requirement. It introduces CodeNER, a code-based prompting framework that embeds explicit BIO schema and input sentence context within a programming-language-style prompt to guide LLMs in identifying accurate entity boundaries. Across ten multilingual benchmarks and multiple model families, CodeNER consistently outperforms traditional text-based prompts, with additional gains when combined with chain-of-thought reasoning. The study demonstrates the potential of structured, code-based prompts for complex sequence labeling tasks and outlines practical considerations and limitations for future refinement.
Abstract
Recent studies have explored various approaches for treating candidate named entity spans as both source and target sequences in named entity recognition (NER) by leveraging large language models (LLMs). Although previous approaches have successfully generated candidate named entity spans with suitable labels, they rely solely on input context information when using LLMs, particularly, ChatGPT. However, NER inherently requires capturing detailed labeling requirements with input context information. To address this issue, we propose a novel method that leverages code-based prompting to improve the capabilities of LLMs in understanding and performing NER. By embedding code within prompts, we provide detailed BIO schema instructions for labeling, thereby exploiting the ability of LLMs to comprehend long-range scopes in programming languages. Experimental results demonstrate that the proposed code-based prompting method outperforms conventional text-based prompting on ten benchmarks across English, Arabic, Finnish, Danish, and German datasets, indicating the effectiveness of explicitly structuring NER instructions. We also verify that combining the proposed code-based prompting method with the chain-of-thought prompting further improves performance.
