Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

Shiqi Yan; Yubo Chen; Ruiqi Zhou; Zhengxi Yao; Shuai Chen; Tianyi Zhang; Shijie Zhang; Wei Qiang Zhang; Yongfeng Huang; Haixin Duan; Yunqi Zhang

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

Shiqi Yan, Yubo Chen, Ruiqi Zhou, Zhengxi Yao, Shuai Chen, Tianyi Zhang, Shijie Zhang, Wei Qiang Zhang, Yongfeng Huang, Haixin Duan, Yunqi Zhang

TL;DR

This paper proposes Explore-on-Graph (EoG), a novel framework that encourages LLMs to autonomously explore a more diverse reasoning space on Knowledge Graphs, and introduces reinforcement learning during training to incentivize exploration and discovery of novel reasoning paths.

Abstract

The reasoning process of Large Language Models (LLMs) is often plagued by hallucinations and missing facts in question-answering tasks. A promising solution is to ground LLMs' answers in verifiable knowledge sources, such as Knowledge Graphs (KGs). Prevailing KG-enhanced methods typically constrained LLM reasoning either by enforcing rules during generation or by imitating paths from a fixed set of demonstrations. However, they naturally confined the reasoning patterns of LLMs within the scope of prior experience or fine-tuning data, limiting their generalizability to out-of-distribution graph reasoning problems. To tackle this problem, in this paper, we propose Explore-on-Graph (EoG), a novel framework that encourages LLMs to autonomously explore a more diverse reasoning space on KGs. To incentivize exploration and discovery of novel reasoning paths, we propose to introduce reinforcement learning during training, whose reward is the correctness of the reasoning paths' final answers. To enhance the efficiency and meaningfulness of the exploration, we propose to incorporate path information as additional reward signals to refine the exploration process and reduce futile efforts. Extensive experiments on five KGQA benchmark datasets demonstrate that, to the best of our knowledge, our method achieves state-of-the-art performance, outperforming not only open-source but also even closed-source LLMs.

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

TL;DR

Abstract

Paper Structure (33 sections, 7 equations, 15 figures, 13 tables)

This paper contains 33 sections, 7 equations, 15 figures, 13 tables.

Introduction
Related Work
Methodology
Supervised Fine-Tuning
Reinforcement Learning
Reinforcement Learning with Outcome Reward
Reinforcement Learning with Path-refined reward
Experiments
Experiments Setup
Main Results
Ablation Study
Analysis of Reasoning Quality
Performance on Complex Reasoning Scenarios
Analysis of Exploration Behavior
Performance on Out of Distribution Datasets
...and 18 more sections

Figures (15)

Figure 1: Examples of (a) rule/imitation-based method and (b) our exploration method. O.O.D. stands for Out-Of-Distribution.
Figure 2: The overall framework of our approach. Red arrows in the knowledge graph stands for the golden reasoning paths of the given question.
Figure 3: Performance of EoG with different ratios of the path reward to the outcome reward. In the figure, the horizontal axis represents the differ- ent ratios, while the vertical axis indicates the per- formance difference compared to using only the outcome reward.
Figure 4: A multi-dimensional performance comparison on different reasoning paths.
Figure 5: Hit@1 comparison and performance ratios across four datasets under out-of-distribution (O.O.D.) settings. The O.O.D.-to-I.I.D. ratio is defined as a model's performance score on Out-of-Distribution (O.O.D.) data divided by its performance score on Independent and Identically Distributed (I.I.D.) data. Subfigures (c) and (d) show the O.O.D.-to-I.I.D. ratios of the models on the four datasets. 2Wiki refers to the 2WikiMultihop dataset. The Y-axis shows the dataset the model was trained on and the X-axis shows the dataset the model was evaluated on.
...and 10 more figures

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

TL;DR

Abstract

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (15)