Table of Contents
Fetching ...

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

Zijian Wang, Bin Wang, Haifeng Jing, Huayu Li, Hongbo Dou

TL;DR

Knowledge graph reasoning often suffers from sparse rewards and difficulty locating long-distance inference paths in sparse graphs. FULORA introduces a dual-agent hierarchical RL framework where GIANT operates on a cluster-level KG $\mathcal{G}^c$ to guide DWARF on the entity-level KG $\mathcal{G}^e$, balancing exploration and guidance through a constraint reward based on state similarity $\text{Sim}(s_t^c,s_t^e)$. It employs attention for DWARF, dynamic path feedback to accelerate GIANT's learning, and reward shaping to ensure policy consistency, implemented with REINFORCE and a Lagrange multiplier. Across three real-world datasets (NELL-995, WN18RR, FB15K-237), FULORA achieves state-of-the-art long-distance reasoning while maintaining strong short-distance performance, demonstrating robust applicability to both sparse and dense KG scenarios.

Abstract

Recent years, multi-hop reasoning has been widely studied for knowledge graph (KG) reasoning due to its efficacy and interpretability. However, previous multi-hop reasoning approaches are subject to two primary shortcomings. First, agents struggle to learn effective and robust policies at the early phase due to sparse rewards. Second, these approaches often falter on specific datasets like sparse knowledge graphs, where agents are required to traverse lengthy reasoning paths. To address these problems, we propose a multi-hop reasoning model with dual agents based on hierarchical reinforcement learning (HRL), which is named FULORA. FULORA tackles the above reasoning challenges by eFficient GUidance-ExpLORAtion between dual agents. The high-level agent walks on the simplified knowledge graph to provide stage-wise hints for the low-level agent walking on the original knowledge graph. In this framework, the low-level agent optimizes a value function that balances two objectives: (1) maximizing return, and (2) integrating efficient guidance from the high-level agent. Experiments conducted on three real-word knowledge graph datasets demonstrate that FULORA outperforms RL-based baselines, especially in the case of long-distance reasoning.

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

TL;DR

Knowledge graph reasoning often suffers from sparse rewards and difficulty locating long-distance inference paths in sparse graphs. FULORA introduces a dual-agent hierarchical RL framework where GIANT operates on a cluster-level KG to guide DWARF on the entity-level KG , balancing exploration and guidance through a constraint reward based on state similarity . It employs attention for DWARF, dynamic path feedback to accelerate GIANT's learning, and reward shaping to ensure policy consistency, implemented with REINFORCE and a Lagrange multiplier. Across three real-world datasets (NELL-995, WN18RR, FB15K-237), FULORA achieves state-of-the-art long-distance reasoning while maintaining strong short-distance performance, demonstrating robust applicability to both sparse and dense KG scenarios.

Abstract

Recent years, multi-hop reasoning has been widely studied for knowledge graph (KG) reasoning due to its efficacy and interpretability. However, previous multi-hop reasoning approaches are subject to two primary shortcomings. First, agents struggle to learn effective and robust policies at the early phase due to sparse rewards. Second, these approaches often falter on specific datasets like sparse knowledge graphs, where agents are required to traverse lengthy reasoning paths. To address these problems, we propose a multi-hop reasoning model with dual agents based on hierarchical reinforcement learning (HRL), which is named FULORA. FULORA tackles the above reasoning challenges by eFficient GUidance-ExpLORAtion between dual agents. The high-level agent walks on the simplified knowledge graph to provide stage-wise hints for the low-level agent walking on the original knowledge graph. In this framework, the low-level agent optimizes a value function that balances two objectives: (1) maximizing return, and (2) integrating efficient guidance from the high-level agent. Experiments conducted on three real-word knowledge graph datasets demonstrate that FULORA outperforms RL-based baselines, especially in the case of long-distance reasoning.
Paper Structure (34 sections, 1 theorem, 15 equations, 11 figures, 9 tables, 2 algorithms)

This paper contains 34 sections, 1 theorem, 15 equations, 11 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Given two MDPs that differ only in reward function, denoted as $M = (\mathcal{S}, \mathcal{A}, \delta, \mathcal{R})$ and $M' = (\mathcal{S}, \mathcal{A}, \delta, \mathcal{R}_D)$ respectively, where $\mathcal{R} = r_c(s_t^c)$ is the default reward while $\mathcal{R}_D = r_c(s_t^c) - \alpha \Delta(s_t

Figures (11)

  • Figure 1: An illustrative example of short direct path and long indirect path. When no short direct path exists, the agent searches for a long indirect path.
  • Figure 2: An overview of FULORA framework. ❶ Given a KG $\mathcal{G}$, We first pre-embed the KG using TransE and then apply K-means clustering to generate the cluster-level KG $\mathcal{G}^c$. ❷ We design separate policy networks for GIANT and DWARF, using cluster-level KG $\mathcal{G}^c$ and entity-level KG $\mathcal{G}^e$ as inputs. The hidden states of GIANT and DWARF $\mathbf{h}_t^c$ and $\mathbf{h}_t^e$ share information, facilitating communication. ❸ To enable DWARF to better leverage the KG structure, we apply the graph attention mechanism and feed the resulting attention vector into the policy network. Dynamic Path Feedback alleviates the near-random policy issue caused by sparse rewards in early training phase, allowing GIANT to provide high-quality guidance to DWARF sooner.
  • Figure 3: An illustration of Effective Guidance-Exploration. When DWARF is out of bounds, GIANT guides it to move quickly inside. Otherwise, DWARF prefers to explore for itself to find a correct target.
  • Figure 4: Learning curves comparing the performance of ours against CURL from NELL-995 relation tasks and all tasks. Averaged over 5 seeds with the shaded area showing standard deviation. Our proposed model is significantly better than CURL in both score and stability.
  • Figure 5: The long-distance performance: FULORA significantly outpeforms CURL, SQUIRE, HMLS, LMKE, RED-GNN, NBFNet on NELL-995 (standrad KG) and WN18RR (Sparse KG).
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 1: Consistency of optimal policy
  • Proof 1