KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

Shuai Wang; Yinan Yu

KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

Shuai Wang, Yinan Yu

Abstract

Large Language Models (LLMs) demonstrate impressive natural language capabilities but often struggle with knowledge-intensive reasoning tasks. Knowledge Base Question Answering (KBQA), which leverages structured Knowledge Graphs (KGs) exemplifies this challenge due to the need for accurate multi-hop reasoning. Existing approaches typically perform sequential reasoning steps guided by predefined pipelines, restricting flexibility and causing error cascades due to isolated reasoning at each step. To address these limitations, we propose KG-Hopper, a novel Reinforcement Learning (RL) framework that empowers compact open LLMs with the ability to perform integrated multi-hop KG reasoning within a single inference round. Rather than reasoning step-by-step, we train a Reasoning LLM that embeds the entire KG traversal and decision process into a unified ``thinking'' stage, enabling global reasoning over cross-step dependencies and dynamic path exploration with backtracking. Experimental results on eight KG reasoning benchmarks show that KG-Hopper, based on a 7B-parameter LLM, consistently outperforms larger multi-step systems (up to 70B) and achieves competitive performance with proprietary models such as GPT-3.5-Turbo and GPT-4o-mini, while remaining compact, open, and data-efficient. The code is publicly available at: https://github.com/Wangshuaiia/KG-Hopper.

KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

Abstract

Paper Structure (17 sections, 2 equations, 2 figures, 3 tables)

This paper contains 17 sections, 2 equations, 2 figures, 3 tables.

Introduction
Task Definition
Method
Knowledge Graph Retrieval Tool
Cold Start
Reasoning-oriented Reinforcement Learning
Optimization
Experiments
Datasets
Implementation Details
Main Results
Ablation
RL vs SFT.
RL Reward Design.
RL Sampling Efficiency.
...and 2 more sections

Figures (2)

Figure 1: Multi-step vs one-round multi-hop reasoning over a knowledge graph: (a) multi-step reasoning and (b) our one-round reasoning. The multi-step pipeline invokes multiple sequential LLM calls and fails due to the missing entity Yellow Hibiscus, leading to an incorrect path. In contrast, our one-round approach performs the entire reasoning process within a single Reasoning LLM call, maintaining coherence and demonstrating robustness to incomplete knowledge.
Figure 2: The RL training process under two settings: with and without history resampling (RQ4). The figure shows how reward, response length, and retrieval count change over training steps.

KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

Abstract

KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

Authors

Abstract

Table of Contents

Figures (2)