Detecting Future-related Contexts of Entity Mentions
Puneet Prashar, Krishna Mohan Shukla, Adam Jatowt
TL;DR
Problem: detecting future-oriented contexts of entity mentions when explicit temporal markers are absent. Approach: assemble a balanced dataset of 19,540 sentences around Wikipedia entities, mask explicit dates, and evaluate traditional ML, transformer-based, and LLM methods under various supervision regimes. Key findings: transformer models (RoBERTa-base) reach F1 ≈ 0.913; fine-tuned Llama 3 reaches F1 ≈ 0.934, outperforming zero-shot and few-shot LLMs; traditional methods lag behind. Significance: provides a benchmark and methodology for robust temporal information extraction with practical implications for forecasting, decision support, and search, and sets the stage for cross-domain and streaming deployment.
Abstract
The ability to automatically identify whether an entity is referenced in a future context can have multiple applications including decision making, planning and trend forecasting. This paper focuses on detecting implicit future references in entity-centric texts, addressing the growing need for automated temporal analysis in information processing. We first present a novel dataset of 19,540 sentences built around popular entities sourced from Wikipedia, which consists of future-related and non-future-related contexts in which those entities appear. As a second contribution, we evaluate the performance of several Language Models including also Large Language Models (LLMs) on the task of distinguishing future-oriented content in the absence of explicit temporal references.
