TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement Learning
Zihao Jiang, Miao Peng, Zhenyan Shan, Wenjie Xu, Ben Liu, Gong Chen, Ziqi Gao, Min Peng
TL;DR
TKG-Thinker reframes temporal knowledge graph QA as an autonomous, interactive agent problem, addressing hallucinations from static prompting by integrating planning, time-aware retrieval, and multi-objective RL. The two-stage training (SFT for cold start followed by online RL with temporal tool calls) yields robust, temporally grounded reasoning across multiple LLM backbones and datasets. Key innovations include a temporal action space with dedicated Search tools, a ReAct-style interaction protocol, and a multi-reward design that fosters format fidelity, evidence retrieval, and factual accuracy. Empirical results demonstrate state-of-the-art performance on MULTITQ and CronQuestions, with strong generalization to complex temporal reasoning, underscoring the practical impact for robust, autonomous TKGQA systems.
Abstract
Temporal knowledge graph question answering (TKGQA) aims to answer time-sensitive questions by leveraging temporal knowledge bases. While Large Language Models (LLMs) demonstrate significant potential in TKGQA, current prompting strategies constrain their efficacy in two primary ways. First, they are prone to reasoning hallucinations under complex temporal constraints. Second, static prompting limits model autonomy and generalization, as it lack optimization through dynamic interaction with temporal knowledge graphs (TKGs) environments. To address these limitations, we propose \textbf{TKG-Thinker}, a novel agent equipped with autonomous planning and adaptive retrieval capabilities for reasoning over TKGs. Specifically, TKG-Thinker performs in-depth temporal reasoning through dynamic multi-turn interactions with TKGs via a dual-training strategy. We first apply Supervised Fine-Tuning (SFT) with chain-of thought data to instill core planning capabilities, followed by a Reinforcement Learning (RL) stage that leverages multi-dimensional rewards to refine reasoning policies under intricate temporal constraints. Experimental results on benchmark datasets with three open-source LLMs show that TKG-Thinker achieves state-of-the-art performance and exhibits strong generalization across complex TKGQA settings.
