Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models
Jiexin Wang, Adam Jatowt, Yi Cai
TL;DR
BiTimeBERT 2.0 introduces three pre-training objectives—Extended Time-Aware Masked Language Modeling (ETAMLM), Document Dating (DD), and Time-Sensitive Entity Replacement (TSER)—to jointly capture content time, document timestamps, and time-sensitive entities. Trained on a refined NYT-based temporal news corpus, it achieves strong improvements on time-focused tasks like event occurrence time estimation, document dating, and semantic change detection, while reducing training cost by about 53% compared to prior setups. The model demonstrates robust generalization to datasets outside its pre-training temporal scope and enables practical applications in temporal information retrieval and QA, with demonstrated efficiency and effectiveness across long-range temporal data. The work also explores ablations and case studies, underscoring the value of integrating multiple temporal dimensions and suggesting directions for continual learning and finer-grained temporal modeling. Overall, BiTimeBERT 2.0 offers a scalable, time-aware representation framework that enhances temporal reasoning in NLP and IR tasks.
Abstract
In the evolving field of Natural Language Processing (NLP), understanding the temporal context of text is increasingly critical for applications requiring advanced temporal reasoning. Traditional pre-trained language models like BERT, which rely on synchronic document collections such as BookCorpus and Wikipedia, often fall short in effectively capturing and leveraging temporal information. To address this limitation, we introduce BiTimeBERT 2.0, a novel time-aware language model pre-trained on a temporal news article collection. BiTimeBERT 2.0 incorporates temporal information through three innovative pre-training objectives: Extended Time-Aware Masked Language Modeling (ETAMLM), Document Dating (DD), and Time-Sensitive Entity Replacement (TSER). Each objective is specifically designed to target a distinct dimension of temporal information: ETAMLM enhances the model's understanding of temporal contexts and relations, DD integrates document timestamps as explicit chronological markers, and TSER focuses on the temporal dynamics of "Person" entities. Moreover, our refined corpus preprocessing strategy reduces training time by nearly 53\%, making BiTimeBERT 2.0 significantly more efficient while maintaining high performance. Experimental results show that BiTimeBERT 2.0 achieves substantial improvements across a broad range of time-related tasks and excels on datasets spanning extensive temporal ranges. These findings underscore BiTimeBERT 2.0's potential as a powerful tool for advancing temporal reasoning in NLP.
