Table of Contents
Fetching ...

MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues

Liang Xue, Haoyu Liu, Yajun Tian, Xinyu Zhong, Yang Liu

TL;DR

This work tackles fine-grained entity recognition in task-oriented dialogues, where domain adaptation and controllable retrieval pose major challenges. It introduces MME-RAG, a hierarchical Judge-Solve framework that decouples type-level judgment (Managers) from span-level extraction (Experts), with KeyInfo-driven retrieval that injects few-shot, entity-aligned exemplars during inference. The approach enables rapid domain and entity adaptation without retraining, validated across public benchmarks (CrossNER, MIT-Movie, MIT-Restaurant) and a new multi-domain customer-service dataset, achieving strong or state-of-the-art performance in most settings. The results establish MME-RAG as a scalable, interpretable solution for adaptive dialogue understanding, with notable improvements in low-resource domains and robust cross-domain generalization grounded in selective retrieval and task decomposition.

Abstract

Fine-grained entity recognition is crucial for reasoning and decision-making in task-oriented dialogues, yet current large language models (LLMs) continue to face challenges in domain adaptation and retrieval controllability. We introduce MME-RAG, a Multi-Manager-Expert Retrieval-Augmented Generation framework that decomposes entity recognition into two coordinated stages: type-level judgment by lightweight managers and span-level extraction by specialized experts. Each expert is supported by a KeyInfo retriever that injects semantically aligned, few-shot exemplars during inference, enabling precise and domain-adaptive extraction without additional training. Experiments on CrossNER, MIT-Movie, MIT-Restaurant, and our newly constructed multi-domain customer-service dataset demonstrate that MME-RAG performs better than recent baselines in most domains. Ablation studies further show that both the hierarchical decomposition and KeyInfo-guided retrieval are key drivers of robustness and cross-domain generalization, establishing MME-RAG as a scalable and interpretable solution for adaptive dialogue understanding.

MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues

TL;DR

This work tackles fine-grained entity recognition in task-oriented dialogues, where domain adaptation and controllable retrieval pose major challenges. It introduces MME-RAG, a hierarchical Judge-Solve framework that decouples type-level judgment (Managers) from span-level extraction (Experts), with KeyInfo-driven retrieval that injects few-shot, entity-aligned exemplars during inference. The approach enables rapid domain and entity adaptation without retraining, validated across public benchmarks (CrossNER, MIT-Movie, MIT-Restaurant) and a new multi-domain customer-service dataset, achieving strong or state-of-the-art performance in most settings. The results establish MME-RAG as a scalable, interpretable solution for adaptive dialogue understanding, with notable improvements in low-resource domains and robust cross-domain generalization grounded in selective retrieval and task decomposition.

Abstract

Fine-grained entity recognition is crucial for reasoning and decision-making in task-oriented dialogues, yet current large language models (LLMs) continue to face challenges in domain adaptation and retrieval controllability. We introduce MME-RAG, a Multi-Manager-Expert Retrieval-Augmented Generation framework that decomposes entity recognition into two coordinated stages: type-level judgment by lightweight managers and span-level extraction by specialized experts. Each expert is supported by a KeyInfo retriever that injects semantically aligned, few-shot exemplars during inference, enabling precise and domain-adaptive extraction without additional training. Experiments on CrossNER, MIT-Movie, MIT-Restaurant, and our newly constructed multi-domain customer-service dataset demonstrate that MME-RAG performs better than recent baselines in most domains. Ablation studies further show that both the hierarchical decomposition and KeyInfo-guided retrieval are key drivers of robustness and cross-domain generalization, establishing MME-RAG as a scalable and interpretable solution for adaptive dialogue understanding.

Paper Structure

This paper contains 34 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: MME-RAG overview. The Orchestrator first routes user input to the appropriate domain Managers (e.g., automotive industry), which then activates relevant entity-level Experts (e.g., product type, requirement, buyer group). Each expert employs RAG to extract domain-specific entities, supported by the KeyInfo Retriever that enhances traditional retrieval by leveraging user- and assistant-level key information for fine-grained relevance ranking. The Database stores domain knowledge (entity definitions, history, and key info), enabling adaptive retrieval.
  • Figure 2: Illustration of MME-RAG’s modular expansion strategy. At the Manager level, new Managers can be added for emerging domains (e.g., Healthcare). At the Expert level, new Experts are introduced only when novel entity types arise (e.g., Litigation). This hierarchical design allows the system to scale across industries while minimizing disruption to existing modules, achieving cost-efficient and rapid domain–entity adaptation.
  • Figure 3: Overview of Customer-Service Dataset.The schema distinguishes between Cross-Industry Generic Entities---shared across all domains (e.g., user name, contact information, time, budget)---and Vertical-Domain Entities, specific to individual industries. Examples include automotive attributes (engine type, brand), home features (layout, price), real estate properties (type, location), and legal or financial aspects (dispute type, litigation stage). This fine-grained entity hierarchy facilitates precise domain understanding and robust entity-level reasoning.
  • Figure 4: Illustration of a single annotated instance in the multi-domain dialogue dataset.Each record comprises five components: (1) Basic Metadata (dialogue ID, domain), (2) Conversation History (user--assistant turns), (3) Entity Extraction (e.g., layout, location, budget in the Real Estate domain), (4) Chain-of-Thought (CoT) reasoning for entity linking and intent derivation, and (5) Key_Info, summarizing essential mentions from both participants. This unified annotation schema enables structured evaluation and supports KeyInfo-based augmentation for the MME-RAG framework.