CHEW: A Dataset of CHanging Events in Wikipedia

Hsuvas Borkakoty; Luis Espinosa-Anke

CHEW: A Dataset of CHanging Events in Wikipedia

Hsuvas Borkakoty, Luis Espinosa-Anke

TL;DR

CHEW introduces a temporally grounded dataset of changing Wikipedia content to probe LLMs' timeline understanding and temporal alignment. It derives a positive–negative set from TAQA and Wikipedia revisions, with SBERT-based filtering to ensure genuine evidence of change. Across prompting, binary change detection, and fine-tuning experiments, CHEW reveals that present LLMs vary in their ability to construct accurate timelines, with some models capable of verbatim reproduction and others showing limited temporal adaptation; fine-tuning improves performance and temporal embeddings. The work demonstrates that CHEW-derived signals can enhance temporal representations and suggests directions for continual learning and robust temporal alignment in large language models. This has practical implications for keeping generative models up-to-date while mitigating temporal misalignment and misinformation risks.

Abstract

We introduce CHEW, a novel dataset of changing events in Wikipedia expressed in naturally occurring text. We use CHEW for probing LLMs for their timeline understanding of Wikipedia entities and events in generative and classification experiments. Our results suggest that LLMs, despite having temporal information available, struggle to construct accurate timelines. We further show the usefulness of CHEW-derived embeddings for identifying meaning shift.

CHEW: A Dataset of CHanging Events in Wikipedia

TL;DR

Abstract

Paper Structure (13 sections, 1 equation, 3 figures, 5 tables)

This paper contains 13 sections, 1 equation, 3 figures, 5 tables.

Introduction
Building CHEW
Experiments
Prompting for timeline knowledge
Prompt-based change detection
Fine-tuning experiments
Better temporal embeddings with CHEW
Conclusions and Future Work
Limitations
Ethics statement
Prompt for Generation of changes
Prompt-based change detection experiment
Models and training details

Figures (3)

Figure 1: Barplots with the time forward and time forward data splits.
Figure 2: Similarities comparing ground truth and generations after probing for temporal knowledge.
Figure 3: Prompt-based classification change prediction results.

CHEW: A Dataset of CHanging Events in Wikipedia

TL;DR

Abstract

CHEW: A Dataset of CHanging Events in Wikipedia

Authors

TL;DR

Abstract

Table of Contents

Figures (3)