Temporal Validity Change Prediction

Georg Wenzel; Adam Jatowt

Temporal Validity Change Prediction

Georg Wenzel, Adam Jatowt

TL;DR

This paper introduces Temporal Validity Change Prediction (TVCP), a new task that requires models to determine how a contextual statement alters the temporal validity duration of a target statement, formalized via $TV_d(s_t)$ and $TV_d^{s_f}(s_t)$. A Twitter-based dataset with 5,055 samples and crowd-sourced context is published to benchmark transformer architectures, including a multitask variant that jointly predicts temporal durations to improve TVCP performance. The study shows that explicit temporal-duration supervision yields gains (notably with SelfExplain + multitask), while foundation models like ChatGPT underperform relative to fine-tuned transformers in this domain, highlighting limitations in temporal commonsense understanding. The results emphasize the value of context-aware temporal reasoning for applications in recommender systems, conversational AI, and story understanding, and point to future work on larger datasets, better segmentation between target and context, and generative approaches.

Abstract

Temporal validity is an important property of text that is useful for many downstream applications, such as recommender systems, conversational AI, or story understanding. Existing benchmarking tasks often require models to identify the temporal validity duration of a single statement. However, in many cases, additional contextual information, such as sentences in a story or posts on a social media profile, can be collected from the available text stream. This contextual information may greatly alter the duration for which a statement is expected to be valid. We propose Temporal Validity Change Prediction, a natural language processing task benchmarking the capability of machine learning models to detect contextual statements that induce such change. We create a dataset consisting of temporal target statements sourced from Twitter and crowdsource sample context statements. We then benchmark a set of transformer-based language models on our dataset. Finally, we experiment with temporal validity duration prediction as an auxiliary task to improve the performance of the state-of-the-art model.

Temporal Validity Change Prediction

TL;DR

and

. A Twitter-based dataset with 5,055 samples and crowd-sourced context is published to benchmark transformer architectures, including a multitask variant that jointly predicts temporal durations to improve TVCP performance. The study shows that explicit temporal-duration supervision yields gains (notably with SelfExplain + multitask), while foundation models like ChatGPT underperform relative to fine-tuned transformers in this domain, highlighting limitations in temporal commonsense understanding. The results emphasize the value of context-aware temporal reasoning for applications in recommender systems, conversational AI, and story understanding, and point to future work on larger datasets, better segmentation between target and context, and generative approaches.

Abstract

Paper Structure (23 sections, 5 equations, 17 figures, 5 tables)

This paper contains 23 sections, 5 equations, 17 figures, 5 tables.

Introduction
Related Work
Temporal Commonsense Reasoning
Temporal Validity
Comparison with Related Work
Task
Defining Temporal Validity
Formalizing Existing Tasks
Temporal Validity Duration Estimation
Temporal Natural Language Inference
Temporal Validity Change Prediction
Dataset
Experiments
Language Models
Multitask Implementation
...and 8 more sections

Figures (17)

Figure 1: A visualization of the TVCP task
Figure 2: Distribution of different types of temporal information in a sample of our dataset
Figure 3: An example of $\textsc{TV}_d$, TNLI and TVCP
Figure 4: Dimensions of temporal validity change. The frequency of each category for DEC and INC classes in our sample is appended.
Figure 5: A summary of our tweet collection pipeline
...and 12 more figures

Temporal Validity Change Prediction

TL;DR

Abstract

Temporal Validity Change Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (17)