Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study
Imed Keraghel, Stanislas Morbieu, Mohamed Nadif
TL;DR
This survey surveys Named Entity Recognition from rule-based systems to modern Transformer and Large Language Model approaches, with emphasis on low-resource, cross-domain, and graph-based methods. It assesses how LLMs, RL, and graph techniques complement traditional models, and provides a cross-dataset comparison using ten diverse corpora. Key findings show Transformers excel on large, general-domain data, while CRF/LSTM-CRF and domain-tuned models outperform in specialized domains; LLMs offer strong few-shot/zero-shot potential but require hybrid integration for best performance. The study advocates hybrid architectures (e.g., LLM-guided disambiguation with fine-tuned NER heads) and calls for continued evaluation across datasets and annotation schemes to advance robust, scalable NER systems.
Abstract
Named Entity Recognition seeks to extract substrings within a text that name real-world objects and to determine their type (for example, whether they refer to persons or organizations). In this survey, we first present an overview of recent popular approaches, including advancements in Transformer-based methods and Large Language Models (LLMs) that have not had much coverage in other surveys. In addition, we discuss reinforcement learning and graph-based approaches, highlighting their role in enhancing NER performance. Second, we focus on methods designed for datasets with scarce annotations. Third, we evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics (as regards their domain, their size, and their number of classes). We thus provide a deep comparison of algorithms that have never been considered together. Our experiments shed some light on how the characteristics of datasets affect the behavior of the methods we compare.
