RetrieveAll: A Multilingual Named Entity Recognition Framework with Large Language Models
Jin Zhang, Fan Gao, Linyu Li, Yongbin Yu, Xiangxiang Wang, Nyima Tashi, Gadeng Luosang
TL;DR
RetrieveAll tackles multilingual NER by decoupling language-specific features through dynamic LoRA adapters and by injecting cross-granularity knowledge during fine-tuning via hierarchical prompts. It introduces input-aware LoRA retrieval and batched multilingual inference to enable scalable, language-robust NER without external data resources. Empirical results on PAN-X show an average F1 improvement of 12.1% over baselines, with strong per-language gains and robustness to base-model size, while MultiCoNER remains competitive. The framework advances prompt-driven learning and offers a practical, scalable solution for NER across diverse languages, balancing efficiency and accuracy.
Abstract
The rise of large language models has led to significant performance breakthroughs in named entity recognition (NER) for high-resource languages, yet there remains substantial room for improvement in low- and medium-resource languages. Existing multilingual NER methods face severe language interference during the multi-language adaptation process, manifested in feature conflicts between different languages and the competitive suppression of low-resource language features by high-resource languages. Although training a dedicated model for each language can mitigate such interference, it lacks scalability and incurs excessive computational costs in real-world applications. To address this issue, we propose RetrieveAll, a universal multilingual NER framework based on dynamic LoRA. The framework decouples task-specific features across languages and demonstrates efficient dynamic adaptability. Furthermore, we introduce a cross-granularity knowledge augmented method that fully exploits the intrinsic potential of the data without relying on external resources. By leveraging a hierarchical prompting mechanism to guide knowledge injection, this approach advances the paradigm from "prompt-guided inference" to "prompt-driven learning." Experimental results show that RetrieveAll outperforms existing baselines; on the PAN-X dataset, it achieves an average F1 improvement of 12.1 percent.
