MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues
Liang Xue, Haoyu Liu, Yajun Tian, Xinyu Zhong, Yang Liu
TL;DR
This work tackles fine-grained entity recognition in task-oriented dialogues, where domain adaptation and controllable retrieval pose major challenges. It introduces MME-RAG, a hierarchical Judge-Solve framework that decouples type-level judgment (Managers) from span-level extraction (Experts), with KeyInfo-driven retrieval that injects few-shot, entity-aligned exemplars during inference. The approach enables rapid domain and entity adaptation without retraining, validated across public benchmarks (CrossNER, MIT-Movie, MIT-Restaurant) and a new multi-domain customer-service dataset, achieving strong or state-of-the-art performance in most settings. The results establish MME-RAG as a scalable, interpretable solution for adaptive dialogue understanding, with notable improvements in low-resource domains and robust cross-domain generalization grounded in selective retrieval and task decomposition.
Abstract
Fine-grained entity recognition is crucial for reasoning and decision-making in task-oriented dialogues, yet current large language models (LLMs) continue to face challenges in domain adaptation and retrieval controllability. We introduce MME-RAG, a Multi-Manager-Expert Retrieval-Augmented Generation framework that decomposes entity recognition into two coordinated stages: type-level judgment by lightweight managers and span-level extraction by specialized experts. Each expert is supported by a KeyInfo retriever that injects semantically aligned, few-shot exemplars during inference, enabling precise and domain-adaptive extraction without additional training. Experiments on CrossNER, MIT-Movie, MIT-Restaurant, and our newly constructed multi-domain customer-service dataset demonstrate that MME-RAG performs better than recent baselines in most domains. Ablation studies further show that both the hierarchical decomposition and KeyInfo-guided retrieval are key drivers of robustness and cross-domain generalization, establishing MME-RAG as a scalable and interpretable solution for adaptive dialogue understanding.
