ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph
Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, Yaliang Li, Ji-Rong Wen
TL;DR
ReasoningLM presents a unified PLM that performs both natural-language understanding and structural subgraph reasoning for KGQA by embedding a GNN-like propagation within a Transformer. A BFS-based subgraph serialization combined with a constrained, subgraph-aware self-attention mechanism enables tight question–subgraph interaction inside a single model. The approach is complemented by adaptation tuning on 20k synthesized subgraphs and parameter-efficient fine-tuning with adapters, yielding state-of-the-art results with far fewer updated parameters and less training data. Across WebQSP, CWQ, and MetaQA, ReasoningLM significantly outperforms baselines, demonstrating the practicality of integrating graph structure reasoning directly into PLMs for KGQA.
Abstract
Question Answering over Knowledge Graph (KGQA) aims to seek answer entities for the natural language question from a large-scale Knowledge Graph~(KG). To better perform reasoning on KG, recent work typically adopts a pre-trained language model~(PLM) to model the question, and a graph neural network~(GNN) based module to perform multi-hop reasoning on the KG. Despite the effectiveness, due to the divergence in model architecture, the PLM and GNN are not closely integrated, limiting the knowledge sharing and fine-grained feature interactions. To solve it, we aim to simplify the above two-module approach, and develop a more capable PLM that can directly support subgraph reasoning for KGQA, namely ReasoningLM. In our approach, we propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning, and also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions. After adaptation, the PLM can be parameter-efficient fine-tuned on downstream tasks. Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/ReasoningLM}.
