Table of Contents
Fetching ...

Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement Learning

Zifeng Ding, Shenyang Huang, Zeyu Cao, Emma Kondrup, Zachary Yang, Xingyue Huang, Yuan Sui, Zhangdie Yuan, Yuqicheng Zhu, Xianglong Hu, Yuan He, Farimah Poursafaei, Michael Bronstein, Andreas Vlachos

TL;DR

This work tackles explainable future-link forecasting on temporal graphs by fine-tuning LLMs with reinforcement learning. It introduces ReaL-TG, which uses Temporal Context Graph Selection, a GRPO-based RL objective with an outcome-based reward, and a QA-style prompt to generate both predictions and reasoning traces. It also proposes a novel evaluation protocol combining MRR/pMRR for predictions with an LLM-as-a-Judge to assess faithfulness, consistency, and alignment of reasoning; results show ReaL-TG-4B often outperforms much larger frontier LLMs on seen and unseen graphs while producing high-quality explanations. The framework demonstrates practical potential for explainable TG reasoning, enabling generalization to new graphs without retraining and providing a scalable, interpretable forecasting approach with a dedicated reasoning-evaluation mechanism.

Abstract

Forecasting future links is a central task in temporal graph (TG) reasoning, requiring models to leverage historical interactions to predict upcoming ones. Traditional neural approaches, such as temporal graph neural networks, achieve strong performance but lack explainability and cannot be applied to unseen graphs without retraining. Recent studies have begun to explore using large language models (LLMs) for graph reasoning, but most of them are constrained to static graphs or small synthetic TGs and lack the evaluation of the quality of reasoning traces generated by LLMs. In this work, we present Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG), a reinforcement learning framework that fine-tunes LLMs to perform explainable link forecasting on real-world TGs. ReaL-TG uses outcome-based reward to encourage models to self-explore reasoning strategies from graph structure and to produce explanations that directly justify their predictions. To enable evaluation on LLM-generated reasoning traces, we propose a new evaluation protocol combining ranking metrics with an LLM-as-a-Judge system that assesses both the quality of reasoning and the impact of hallucinations. Experiments with ReaL-TG-4B, obtained by fine-tuning Qwen3-4B under our framework, show that it outperforms much larger frontier LLMs, including GPT-5 mini, on ranking metrics, while producing high-quality explanations confirmed by both the LLM judge and human evaluation.

Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement Learning

TL;DR

This work tackles explainable future-link forecasting on temporal graphs by fine-tuning LLMs with reinforcement learning. It introduces ReaL-TG, which uses Temporal Context Graph Selection, a GRPO-based RL objective with an outcome-based reward, and a QA-style prompt to generate both predictions and reasoning traces. It also proposes a novel evaluation protocol combining MRR/pMRR for predictions with an LLM-as-a-Judge to assess faithfulness, consistency, and alignment of reasoning; results show ReaL-TG-4B often outperforms much larger frontier LLMs on seen and unseen graphs while producing high-quality explanations. The framework demonstrates practical potential for explainable TG reasoning, enabling generalization to new graphs without retraining and providing a scalable, interpretable forecasting approach with a dedicated reasoning-evaluation mechanism.

Abstract

Forecasting future links is a central task in temporal graph (TG) reasoning, requiring models to leverage historical interactions to predict upcoming ones. Traditional neural approaches, such as temporal graph neural networks, achieve strong performance but lack explainability and cannot be applied to unseen graphs without retraining. Recent studies have begun to explore using large language models (LLMs) for graph reasoning, but most of them are constrained to static graphs or small synthetic TGs and lack the evaluation of the quality of reasoning traces generated by LLMs. In this work, we present Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG), a reinforcement learning framework that fine-tunes LLMs to perform explainable link forecasting on real-world TGs. ReaL-TG uses outcome-based reward to encourage models to self-explore reasoning strategies from graph structure and to produce explanations that directly justify their predictions. To enable evaluation on LLM-generated reasoning traces, we propose a new evaluation protocol combining ranking metrics with an LLM-as-a-Judge system that assesses both the quality of reasoning and the impact of hallucinations. Experiments with ReaL-TG-4B, obtained by fine-tuning Qwen3-4B under our framework, show that it outperforms much larger frontier LLMs, including GPT-5 mini, on ranking metrics, while producing high-quality explanations confirmed by both the LLM judge and human evaluation.

Paper Structure

This paper contains 41 sections, 3 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Left: The ReaL-TG framework, which enables RL fine-tuning of LLMs to improve TG forecasting (see Sec. \ref{['sec: method']}). Right: The proposed LLM-as-a-Judge system, which provides a comprehensive evaluation of LLM reasoning quality in TG link forecasting (see Sec. \ref{['sec: evaluation protocol']}, paragraph Reasoning Trace Evaluation).
  • Figure 2: Example of context graph selection.
  • Figure 3: Prompt template for LLM to do TG link forecasting in ReaL-TG.
  • Figure 4: Prompt template for LLM-as-a-Judge system.
  • Figure 5: Human Annotation guideline. The detailed evaluation procedure is taken from the prompt template for the LLM-based judging system in Fig. \ref{['fig: judge prompt']}.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 1: Temporal Graph
  • Definition 2: TG Link Forecasting with LLMs
  • Definition 3