Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter
TL;DR
Universal NER (UNER) tackles the lack of gold-standard, multilingual NER benchmarks by creating a community-driven, cross-lingually consistent annotation framework aligned with Universal Dependencies. UNER v1 adds 19 datasets across 13 languages, using a simple PER/ORG/LOC tagset and BIO2 tagging to enable standardized evaluation and cross-lingual transfer analyses. Baseline experiments with XLM-RLarge reveal strong in-language performance and variable cross-lingual transfer, with European languages transferring more readily than Chinese or North-African varieties, highlighting script and typology challenges. The work demonstrates the value of collaboratively built, openly available, multilingual NER resources and outlines paths for expansion, quality control, and deeper cross-lingual analysis that can broadly impact multilingual NLP research and evaluation.
Abstract
We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.
