Faithful Temporal Question Answering over Heterogeneous Sources
Zhen Jia, Philipp Christmann, Gerhard Weikum
TL;DR
This work tackles temporal question answering with explicit and implicit time constraints across heterogeneous sources (KB, text, and tables). It introduces FAITH, a three-stage pipeline comprising Temporal Question Understanding, Faithful Evidence Retrieval, and Explainable Heterogeneous Answering, with implicit constraints resolved through recursive intermediate questions. A key contribution is the Tiq benchmark, automatically built to probe implicit temporal reasoning across diverse sources. Empirical results show FAITH achieves superior faithful answering on Tiq and strong performance on TimeQuestions, outperforming LLMs and unfaithful baselines while providing transparent evidence for trust and explainability.
Abstract
Temporal question answering (QA) involves time constraints, with phrases such as "... in 2019" or "... before COVID". In the former, time is an explicit condition, in the latter it is implicit. State-of-the-art methods have limitations along three dimensions. First, with neural inference, time constraints are merely soft-matched, giving room to invalid or inexplicable answers. Second, questions with implicit time are poorly supported. Third, answers come from a single source: either a knowledge base (KB) or a text corpus. We propose a temporal QA system that addresses these shortcomings. First, it enforces temporal constraints for faithful answering with tangible evidence. Second, it properly handles implicit questions. Third, it operates over heterogeneous sources, covering KB, text and web tables in a unified manner. The method has three stages: (i) understanding the question and its temporal conditions, (ii) retrieving evidence from all sources, and (iii) faithfully answering the question. As implicit questions are sparse in prior benchmarks, we introduce a principled method for generating diverse questions. Experiments show superior performance over a suite of baselines.
