Table of Contents
Fetching ...

TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement Learning

Zihao Jiang, Miao Peng, Zhenyan Shan, Wenjie Xu, Ben Liu, Gong Chen, Ziqi Gao, Min Peng

TL;DR

TKG-Thinker reframes temporal knowledge graph QA as an autonomous, interactive agent problem, addressing hallucinations from static prompting by integrating planning, time-aware retrieval, and multi-objective RL. The two-stage training (SFT for cold start followed by online RL with temporal tool calls) yields robust, temporally grounded reasoning across multiple LLM backbones and datasets. Key innovations include a temporal action space with dedicated Search tools, a ReAct-style interaction protocol, and a multi-reward design that fosters format fidelity, evidence retrieval, and factual accuracy. Empirical results demonstrate state-of-the-art performance on MULTITQ and CronQuestions, with strong generalization to complex temporal reasoning, underscoring the practical impact for robust, autonomous TKGQA systems.

Abstract

Temporal knowledge graph question answering (TKGQA) aims to answer time-sensitive questions by leveraging temporal knowledge bases. While Large Language Models (LLMs) demonstrate significant potential in TKGQA, current prompting strategies constrain their efficacy in two primary ways. First, they are prone to reasoning hallucinations under complex temporal constraints. Second, static prompting limits model autonomy and generalization, as it lack optimization through dynamic interaction with temporal knowledge graphs (TKGs) environments. To address these limitations, we propose \textbf{TKG-Thinker}, a novel agent equipped with autonomous planning and adaptive retrieval capabilities for reasoning over TKGs. Specifically, TKG-Thinker performs in-depth temporal reasoning through dynamic multi-turn interactions with TKGs via a dual-training strategy. We first apply Supervised Fine-Tuning (SFT) with chain-of thought data to instill core planning capabilities, followed by a Reinforcement Learning (RL) stage that leverages multi-dimensional rewards to refine reasoning policies under intricate temporal constraints. Experimental results on benchmark datasets with three open-source LLMs show that TKG-Thinker achieves state-of-the-art performance and exhibits strong generalization across complex TKGQA settings.

TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement Learning

TL;DR

TKG-Thinker reframes temporal knowledge graph QA as an autonomous, interactive agent problem, addressing hallucinations from static prompting by integrating planning, time-aware retrieval, and multi-objective RL. The two-stage training (SFT for cold start followed by online RL with temporal tool calls) yields robust, temporally grounded reasoning across multiple LLM backbones and datasets. Key innovations include a temporal action space with dedicated Search tools, a ReAct-style interaction protocol, and a multi-reward design that fosters format fidelity, evidence retrieval, and factual accuracy. Empirical results demonstrate state-of-the-art performance on MULTITQ and CronQuestions, with strong generalization to complex temporal reasoning, underscoring the practical impact for robust, autonomous TKGQA systems.

Abstract

Temporal knowledge graph question answering (TKGQA) aims to answer time-sensitive questions by leveraging temporal knowledge bases. While Large Language Models (LLMs) demonstrate significant potential in TKGQA, current prompting strategies constrain their efficacy in two primary ways. First, they are prone to reasoning hallucinations under complex temporal constraints. Second, static prompting limits model autonomy and generalization, as it lack optimization through dynamic interaction with temporal knowledge graphs (TKGs) environments. To address these limitations, we propose \textbf{TKG-Thinker}, a novel agent equipped with autonomous planning and adaptive retrieval capabilities for reasoning over TKGs. Specifically, TKG-Thinker performs in-depth temporal reasoning through dynamic multi-turn interactions with TKGs via a dual-training strategy. We first apply Supervised Fine-Tuning (SFT) with chain-of thought data to instill core planning capabilities, followed by a Reinforcement Learning (RL) stage that leverages multi-dimensional rewards to refine reasoning policies under intricate temporal constraints. Experimental results on benchmark datasets with three open-source LLMs show that TKG-Thinker achieves state-of-the-art performance and exhibits strong generalization across complex TKGQA settings.
Paper Structure (35 sections, 10 equations, 13 figures, 10 tables)

This paper contains 35 sections, 10 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Comparison between TKG-Thinker and existing LLM-based methods. TKG-Thinker employs a think–action–observation loop for autonomous interaction with TKGs, enabling verified temporal reasoning.
  • Figure 2: The overview of our proposed TKG-Thinker. We first apply supervised fine-tuning on high-quality trajectories to mitigate the cold-start problem, and further refine the model via online reinforcement learning with temporal tool calls. The bottom panel illustrates three rollouts: a complete success, a partial success, and a failure.
  • Figure 3: Retriever analysis on the MULTITQ dataset. Left: Performance comparison of different retriever models. Right: Effect of retrieval depth, measured by the number of top-$k$ retrieved quadruples.
  • Figure 4: Training dynamics of TKG-Thinker implemented with GRPO and PPO on MULTITQ. Left: Training Reward; Middle: Retrieval Call Steps; Right: Action Steps.
  • Figure 5: Few-shot Prompt for Generating Trajectories.
  • ...and 8 more figures