Table of Contents
Fetching ...

RetrieveAll: A Multilingual Named Entity Recognition Framework with Large Language Models

Jin Zhang, Fan Gao, Linyu Li, Yongbin Yu, Xiangxiang Wang, Nyima Tashi, Gadeng Luosang

TL;DR

RetrieveAll tackles multilingual NER by decoupling language-specific features through dynamic LoRA adapters and by injecting cross-granularity knowledge during fine-tuning via hierarchical prompts. It introduces input-aware LoRA retrieval and batched multilingual inference to enable scalable, language-robust NER without external data resources. Empirical results on PAN-X show an average F1 improvement of 12.1% over baselines, with strong per-language gains and robustness to base-model size, while MultiCoNER remains competitive. The framework advances prompt-driven learning and offers a practical, scalable solution for NER across diverse languages, balancing efficiency and accuracy.

Abstract

The rise of large language models has led to significant performance breakthroughs in named entity recognition (NER) for high-resource languages, yet there remains substantial room for improvement in low- and medium-resource languages. Existing multilingual NER methods face severe language interference during the multi-language adaptation process, manifested in feature conflicts between different languages and the competitive suppression of low-resource language features by high-resource languages. Although training a dedicated model for each language can mitigate such interference, it lacks scalability and incurs excessive computational costs in real-world applications. To address this issue, we propose RetrieveAll, a universal multilingual NER framework based on dynamic LoRA. The framework decouples task-specific features across languages and demonstrates efficient dynamic adaptability. Furthermore, we introduce a cross-granularity knowledge augmented method that fully exploits the intrinsic potential of the data without relying on external resources. By leveraging a hierarchical prompting mechanism to guide knowledge injection, this approach advances the paradigm from "prompt-guided inference" to "prompt-driven learning." Experimental results show that RetrieveAll outperforms existing baselines; on the PAN-X dataset, it achieves an average F1 improvement of 12.1 percent.

RetrieveAll: A Multilingual Named Entity Recognition Framework with Large Language Models

TL;DR

RetrieveAll tackles multilingual NER by decoupling language-specific features through dynamic LoRA adapters and by injecting cross-granularity knowledge during fine-tuning via hierarchical prompts. It introduces input-aware LoRA retrieval and batched multilingual inference to enable scalable, language-robust NER without external data resources. Empirical results on PAN-X show an average F1 improvement of 12.1% over baselines, with strong per-language gains and robustness to base-model size, while MultiCoNER remains competitive. The framework advances prompt-driven learning and offers a practical, scalable solution for NER across diverse languages, balancing efficiency and accuracy.

Abstract

The rise of large language models has led to significant performance breakthroughs in named entity recognition (NER) for high-resource languages, yet there remains substantial room for improvement in low- and medium-resource languages. Existing multilingual NER methods face severe language interference during the multi-language adaptation process, manifested in feature conflicts between different languages and the competitive suppression of low-resource language features by high-resource languages. Although training a dedicated model for each language can mitigate such interference, it lacks scalability and incurs excessive computational costs in real-world applications. To address this issue, we propose RetrieveAll, a universal multilingual NER framework based on dynamic LoRA. The framework decouples task-specific features across languages and demonstrates efficient dynamic adaptability. Furthermore, we introduce a cross-granularity knowledge augmented method that fully exploits the intrinsic potential of the data without relying on external resources. By leveraging a hierarchical prompting mechanism to guide knowledge injection, this approach advances the paradigm from "prompt-guided inference" to "prompt-driven learning." Experimental results show that RetrieveAll outperforms existing baselines; on the PAN-X dataset, it achieves an average F1 improvement of 12.1 percent.

Paper Structure

This paper contains 22 sections, 12 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The figure illustrates the relationship between model size and average F1 scores across eight representative languages (English, Spanish, French, Russian, German, Chinese, Japanese, and Korean) on the PAN-X dataset for RetrieveAll. RetrieveAll consistently delivers substantial performance gains across different base models, combining efficiency with outstanding results.
  • Figure 2: RetrieveAll injects cross-granularity knowledge by retrieving entity-level and context-level examples via hierarchical prompts, while dynamically selecting and mapping the appropriate modules from a multilingual LoRA candidate pool based on the input language, enabling batch multilingual inference.
  • Figure 3: Comparative analysis of the impact of example augmentation at different stages on RetrieveAll's performance.
  • Figure 4: The standardized input and output format of the RetrieveAll is illustrated using a specific data example.