Table of Contents
Fetching ...

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Riccardo Campi, Nicolò Oreste Pinciroli Vago, Mathyas Giudici, Marco Brambilla, Piero Fraternali

Abstract

Retrieval-Augmented Generation (RAG) over Knowledge Graphs (KGs) suffers from the fact that indexing approaches may lose important contextual nuance when text is reduced to triples, thereby degrading performance in downstream Question-Answering (QA) tasks, particularly for multi-hop QA, which requires composing answers from multiple entities, facts, or relations. We propose a domain-agnostic, KG-based QA framework that covers both the indexing and retrieval/inference phases. A new indexing approach called Map-Disambiguate-Enrich-Reduce (MDER) generates context-derived triple descriptions and subsequently integrates them with entity-level summaries, thus avoiding the need for explicit traversal of edges in the graph during the QA retrieval phase. Complementing this, we introduce Decompose-Resolve (DR), a retrieval mechanism that decomposes user queries into resolvable triples and grounds them in the KG via iterative reasoning. Together, MDER and DR form an LLM-driven QA pipeline that is robust to sparse, incomplete, and complex relational data. Experiments show that on standard and domain specific benchmarks, MDER-DR achieves substantial improvements over standard RAG baselines (up to 66%), while maintaining cross-lingual robustness. Our code is available at https://github.com/DataSciencePolimi/MDER-DR_RAG.

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Abstract

Retrieval-Augmented Generation (RAG) over Knowledge Graphs (KGs) suffers from the fact that indexing approaches may lose important contextual nuance when text is reduced to triples, thereby degrading performance in downstream Question-Answering (QA) tasks, particularly for multi-hop QA, which requires composing answers from multiple entities, facts, or relations. We propose a domain-agnostic, KG-based QA framework that covers both the indexing and retrieval/inference phases. A new indexing approach called Map-Disambiguate-Enrich-Reduce (MDER) generates context-derived triple descriptions and subsequently integrates them with entity-level summaries, thus avoiding the need for explicit traversal of edges in the graph during the QA retrieval phase. Complementing this, we introduce Decompose-Resolve (DR), a retrieval mechanism that decomposes user queries into resolvable triples and grounds them in the KG via iterative reasoning. Together, MDER and DR form an LLM-driven QA pipeline that is robust to sparse, incomplete, and complex relational data. Experiments show that on standard and domain specific benchmarks, MDER-DR achieves substantial improvements over standard RAG baselines (up to 66%), while maintaining cross-lingual robustness. Our code is available at https://github.com/DataSciencePolimi/MDER-DR_RAG.
Paper Structure (29 sections, 4 figures, 3 tables)

This paper contains 29 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: High‑level overview of our KG-based question answering framework, featuring the MDER indexing approach for condensing multi‑hop information during KG construction so that downstream retrieval does not need explicit hop traversal, and the DR retrieval mechanism for resolving queries through structured graph reasoning using MDER summaries, allowing multi‑entity reasoning without explicit KG path traversal.
  • Figure 2: Indexing pipeline including the Map‑Disambiguate‑Enrich‑Reduce (MDER) workflow. It begins with preprocessing, where input documents are segmented, summarized, and translated. The core pipeline then proceeds through: (1) Map extracts subject‑predicate‑object triples; (2) Disambiguate unifies fragmented or redundant entities; (3) Enrich adds contextual descriptions to triples; and (4) Reduce generates entity-centric summaries. The final output is a structured graph of interconnected document, chunk, entity, relationship, and triple nodes, optimized for downstream KG tasks, built upon a domain-agnostic RDF/OWL ontology.
  • Figure 3: Retrieval pipeline, including the Decompose-Resolve (DR) approach. It begins with a user query, which is translated into English. DR then produces a set of placeholder-augmented triples from that question and grounds each placeholder in KG entities through a retrieval and reasoning loop. Resolved entities are propagated across triples, and supporting evidence is accumulated throughout the resolution process. Once all placeholders are resolved, a final answer is synthesized using accumulated summaries.
  • Figure 4: Performance evaluation across different datasets and metrics. Results are shown for WikiQA, HotpotQA, and BenchEE datasets using the scores obtained with LLM-as-a-Judge and Soft EM, comparing language match vs. language mismatch conditions.