Table of Contents
Fetching ...

UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph

Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

TL;DR

This paper tackles multi-hop KGQA by unifying retrieval and reasoning within a single architecture that leverages a PLM-based semantic matching module and a matching information propagation component. It introduces abstract subgraphs to bridge retrieval and reasoning and designs a two-stage training regime—contrastive pre-training for question–relation matching followed by retrieval and reasoning fine-tuning with parameter transfer. Empirical results on MetaQA, WebQSP, and CWQ show strong gains over state-of-the-art baselines, especially on WebQSP and CWQ, with ablations confirming the value of pre-training and cross-stage initialization. The approach offers a practical, efficient pathway for integrated KGQA systems with publicly available code.

Abstract

Multi-hop Question Answering over Knowledge Graph~(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question on a large-scale Knowledge Graph (KG). To cope with the vast search space, existing work usually adopts a two-stage approach: it first retrieves a relatively small subgraph related to the question and then performs the reasoning on the subgraph to find the answer entities accurately. Although these two stages are highly related, previous work employs very different technical solutions for developing the retrieval and reasoning models, neglecting their relatedness in task essence. In this paper, we propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning. For model architecture, UniKGQA consists of a semantic matching module based on a pre-trained language model~(PLM) for question-relation semantic matching, and a matching information propagation module to propagate the matching information along the directed edges on KGs. For parameter learning, we design a shared pre-training task based on question-relation matching for both retrieval and reasoning models, and then propose retrieval- and reasoning-oriented fine-tuning strategies. Compared with previous studies, our approach is more unified, tightly relating the retrieval and reasoning stages. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our method on the multi-hop KGQA task. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/UniKGQA}.

UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph

TL;DR

This paper tackles multi-hop KGQA by unifying retrieval and reasoning within a single architecture that leverages a PLM-based semantic matching module and a matching information propagation component. It introduces abstract subgraphs to bridge retrieval and reasoning and designs a two-stage training regime—contrastive pre-training for question–relation matching followed by retrieval and reasoning fine-tuning with parameter transfer. Empirical results on MetaQA, WebQSP, and CWQ show strong gains over state-of-the-art baselines, especially on WebQSP and CWQ, with ablations confirming the value of pre-training and cross-stage initialization. The approach offers a practical, efficient pathway for integrated KGQA systems with publicly available code.

Abstract

Multi-hop Question Answering over Knowledge Graph~(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question on a large-scale Knowledge Graph (KG). To cope with the vast search space, existing work usually adopts a two-stage approach: it first retrieves a relatively small subgraph related to the question and then performs the reasoning on the subgraph to find the answer entities accurately. Although these two stages are highly related, previous work employs very different technical solutions for developing the retrieval and reasoning models, neglecting their relatedness in task essence. In this paper, we propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning. For model architecture, UniKGQA consists of a semantic matching module based on a pre-trained language model~(PLM) for question-relation semantic matching, and a matching information propagation module to propagate the matching information along the directed edges on KGs. For parameter learning, we design a shared pre-training task based on question-relation matching for both retrieval and reasoning models, and then propose retrieval- and reasoning-oriented fine-tuning strategies. Compared with previous studies, our approach is more unified, tightly relating the retrieval and reasoning stages. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our method on the multi-hop KGQA task. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/UniKGQA}.
Paper Structure (19 sections, 7 equations, 4 figures, 9 tables)

This paper contains 19 sections, 7 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Illustrative examples and learning procedure of our work.
  • Figure 2: The illustration of updating entity representation $\bm{e}$ at step $t$ by aggregating the semantic matching information from the set of directed relations pointing to $e$ in the subgraph (i.e., {$r_1, r_2, r_3$}) in our UniKGQA.
  • Figure 3: The evaluation of retrieval and fine-tuning efficiency: the answer coverage rate under various subgraph sizes (Left), the Hits@1 scores under various answer coverage rates (Middle), and the Hits@1 scores at different epochs on WebQSP (Right).
  • Figure 4: The results of ablation study on WebQSP. The performance on WebQSP of varying pre-training steps (Left), hidden dimensions (Middle), and the number of retrieved nodes $K$ (Right).