Towards Explainable Temporal Reasoning in Large Language Models: A Structure-Aware Generative Framework
Zihao Jiang, Ben Liu, Miao Peng, Wenjie Xu, Yao Xiao, Zhenyan Shan, Min Peng
TL;DR
This paper tackles the lack of explainability in temporal reasoning by introducing the Explainable Temporal Reasoning (ETR) benchmark and a structure-aware generative framework GETER. ETR systematically evaluates LLMs on explainable temporal reasoning across multiple temporal granularities, using reasoning chains derived from Temporal Knowledge Graphs and augmented with high-quality explanations generated by GPT-4o, plus carefully constructed negative and neutral samples. GETER bridges graph structure and text via a lightweight structure-text adapter that maps soft graph tokens into the LLM's embedding space, and it uses instruction-tuning with LoRA to produce coherent explanations. Experiments demonstrate state-of-the-art performance on five datasets, with robust improvements in both predictive accuracy and explanation quality, and ablation confirms the necessity of each component. Overall, the work advances transparent temporal reasoning in LLMs and offers practical benchmarks and methods for integrating structured knowledge with natural language explanations.
Abstract
While large language models (LLMs) show great potential in temporal reasoning, most existing work focuses heavily on enhancing performance, often neglecting the explainable reasoning processes underlying the results. To address this gap, we introduce a comprehensive benchmark covering a wide range of temporal granularities, designed to systematically evaluate LLMs' capabilities in explainable temporal reasoning. Furthermore, our findings reveal that LLMs struggle to deliver convincing explanations when relying solely on textual information. To address challenge, we propose GETER, a novel structure-aware generative framework that integrates Graph structures with text for Explainable TEmporal Reasoning. Specifically, we first leverage temporal knowledge graphs to develop a temporal encoder that captures structural information for the query. Subsequently, we introduce a structure-text prefix adapter to map graph structure features into the text embedding space. Finally, LLMs generate explanation text by seamlessly integrating the soft graph token with instruction-tuning prompt tokens. Experimental results indicate that GETER achieves state-of-the-art performance while also demonstrating its effectiveness as well as strong generalization capabilities. Our dataset and code are available at https://github.com/carryTatum/GETER.
