Temporally Consistent Factuality Probing for Large Language Models
Ashutosh Bajpai, Aaryan Goyal, Atif Anwer, Tanmoy Chakraborty
TL;DR
This work introduces TeCFaP, a temporally consistent factuality probe for LLMs, together with the TEMP-COFAC dataset that encodes temporal subject-relation-object sequences across $1526$ to $2022$. It extends factuality and consistency metrics to the temporal dimension and presents CoTSeLF, a framework that combines multi-task instruction-tuning (MT-IT) with consistent-time-sensitive reinforcement learning (CTSRL) to boost temporally consistent factuality. Experimental results show that off-the-shelf LLMs perform poorly on TeCFaP, while CoTSeLF yields substantial improvements over strong baselines, including discrete and smooth CTSRL variants. The work advances time-aware knowledge extraction for LLMs and has practical implications for domains requiring reliable temporal reasoning, such as healthcare and law.
Abstract
The prolific use of Large Language Models (LLMs) as an alternate knowledge base requires them to be factually consistent, necessitating both correctness and consistency traits for paraphrased queries. Recently, significant attempts have been made to benchmark datasets and metrics to evaluate LLMs for these traits. However, structural simplicity (subject-relation-object) and contemporary association in their query formulation limit the broader definition of factuality and consistency. In this study, we introduce TeCFaP, a novel Temporally Consistent Factuality Probe task to expand the consistent factuality probe in the temporal dimension. To this end, we propose TEMP-COFAC, a high-quality dataset of prefix-style English query paraphrases. Subsequently, we extend the definitions of existing metrics to represent consistent factuality across temporal dimension. We experiment with a diverse set of LLMs and find most of them performing poorly on TeCFaP. Next, we propose a novel solution CoTSeLF (Consistent-Time-Sensitive Learning Framework) combining multi-task instruction tuning (MT-IT) with consistent-time-sensitive reinforcement learning (CTSRL) to improve temporally consistent factuality in LLMs. Our experiments demonstrate the efficacy of CoTSeLF over several baselines.
