Temporal Reasoning on Implicit Events from Distant Supervision
Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, Dan Roth
TL;DR
TRACIE introduces a dataset for implicit-event temporal reasoning and demonstrates that state-of-the-art models struggle with latent events. It couples distant supervision pretraining (PtnTime) with a neural-symbolic reasoning model (SymTime) that combines start-time distance and duration to infer end times, aligning with Allen interval concepts. Empirical results show consistent gains on TRACIE and competitive performance on MATRES, including zero-shot and low-resource settings, underscoring the value of temporally aware pretraining and explicit reasoning. The work highlights a path toward generalizable temporal understanding that extends beyond explicit-event benchmarks and opens avenues for further integration of commonsense signals in time reasoning.
Abstract
We propose TRACIE, a novel temporal reasoning dataset that evaluates the degree to which systems understand implicit events -- events that are not mentioned explicitly in natural language text but can be inferred from it. This introduces a new challenge in temporal reasoning research, where prior work has focused on explicitly mentioned events. Human readers can infer implicit events via commonsense reasoning, resulting in a more comprehensive understanding of the situation and, consequently, better reasoning about time. We find, however, that state-of-the-art models struggle when predicting temporal relationships between implicit and explicit events. To address this, we propose a neuro-symbolic temporal reasoning model, SYMTIME, which exploits distant supervision signals from large-scale text and uses temporal rules to combine start times and durations to infer end times. SYMTIME outperforms strong baseline systems on TRACIE by 5%, and by 11% in a zero prior knowledge training setting. Our approach also generalizes to other temporal reasoning tasks, as evidenced by a gain of 1%-9% on MATRES, an explicit event benchmark.
