Table of Contents
Fetching ...

CHEW: A Dataset of CHanging Events in Wikipedia

Hsuvas Borkakoty, Luis Espinosa-Anke

TL;DR

CHEW introduces a temporally grounded dataset of changing Wikipedia content to probe LLMs' timeline understanding and temporal alignment. It derives a positive–negative set from TAQA and Wikipedia revisions, with SBERT-based filtering to ensure genuine evidence of change. Across prompting, binary change detection, and fine-tuning experiments, CHEW reveals that present LLMs vary in their ability to construct accurate timelines, with some models capable of verbatim reproduction and others showing limited temporal adaptation; fine-tuning improves performance and temporal embeddings. The work demonstrates that CHEW-derived signals can enhance temporal representations and suggests directions for continual learning and robust temporal alignment in large language models. This has practical implications for keeping generative models up-to-date while mitigating temporal misalignment and misinformation risks.

Abstract

We introduce CHEW, a novel dataset of changing events in Wikipedia expressed in naturally occurring text. We use CHEW for probing LLMs for their timeline understanding of Wikipedia entities and events in generative and classification experiments. Our results suggest that LLMs, despite having temporal information available, struggle to construct accurate timelines. We further show the usefulness of CHEW-derived embeddings for identifying meaning shift.

CHEW: A Dataset of CHanging Events in Wikipedia

TL;DR

CHEW introduces a temporally grounded dataset of changing Wikipedia content to probe LLMs' timeline understanding and temporal alignment. It derives a positive–negative set from TAQA and Wikipedia revisions, with SBERT-based filtering to ensure genuine evidence of change. Across prompting, binary change detection, and fine-tuning experiments, CHEW reveals that present LLMs vary in their ability to construct accurate timelines, with some models capable of verbatim reproduction and others showing limited temporal adaptation; fine-tuning improves performance and temporal embeddings. The work demonstrates that CHEW-derived signals can enhance temporal representations and suggests directions for continual learning and robust temporal alignment in large language models. This has practical implications for keeping generative models up-to-date while mitigating temporal misalignment and misinformation risks.

Abstract

We introduce CHEW, a novel dataset of changing events in Wikipedia expressed in naturally occurring text. We use CHEW for probing LLMs for their timeline understanding of Wikipedia entities and events in generative and classification experiments. Our results suggest that LLMs, despite having temporal information available, struggle to construct accurate timelines. We further show the usefulness of CHEW-derived embeddings for identifying meaning shift.
Paper Structure (13 sections, 1 equation, 3 figures, 5 tables)

This paper contains 13 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Barplots with the time forward and time forward data splits.
  • Figure 2: Similarities comparing ground truth and generations after probing for temporal knowledge.
  • Figure 3: Prompt-based classification change prediction results.