Table of Contents
Fetching ...

Formulation Comparison for Timeline Construction using LLMs

Kimihiro Hasegawa, Nikhil Kandukuri, Susan Holm, Yukari Yamakawa, Teruko Mitamura

TL;DR

A novel evaluation framework to compare multiple task formulations with TimeSET by prompting open LLMs, i.e., Llama 2 and Flan-T5, to gain a robust understanding of their capabilities and benchmark open LLMs on existing event temporal ordering datasets to gain a robust understanding of their capabilities.

Abstract

Constructing a timeline requires identifying the chronological order of events in an article. In prior timeline construction datasets, temporal orders are typically annotated by either event-to-time anchoring or event-to-event pairwise ordering, both of which suffer from missing temporal information. To mitigate the issue, we develop a new evaluation dataset, TimeSET, consisting of single-document timelines with document-level order annotation. TimeSET features saliency-based event selection and partial ordering, which enable a practical annotation workload. Aiming to build better automatic timeline construction systems, we propose a novel evaluation framework to compare multiple task formulations with TimeSET by prompting open LLMs, i.e., Llama 2 and Flan-T5. Considering that identifying temporal orders of events is a core subtask in timeline construction, we further benchmark open LLMs on existing event temporal ordering datasets to gain a robust understanding of their capabilities. Our experiments show that (1) NLI formulation with Flan-T5 demonstrates a strong performance among others, while (2) timeline construction and event temporal ordering are still challenging tasks for few-shot LLMs. Our code and data are available at https://github.com/kimihiroh/timeset.

Formulation Comparison for Timeline Construction using LLMs

TL;DR

A novel evaluation framework to compare multiple task formulations with TimeSET by prompting open LLMs, i.e., Llama 2 and Flan-T5, to gain a robust understanding of their capabilities and benchmark open LLMs on existing event temporal ordering datasets to gain a robust understanding of their capabilities.

Abstract

Constructing a timeline requires identifying the chronological order of events in an article. In prior timeline construction datasets, temporal orders are typically annotated by either event-to-time anchoring or event-to-event pairwise ordering, both of which suffer from missing temporal information. To mitigate the issue, we develop a new evaluation dataset, TimeSET, consisting of single-document timelines with document-level order annotation. TimeSET features saliency-based event selection and partial ordering, which enable a practical annotation workload. Aiming to build better automatic timeline construction systems, we propose a novel evaluation framework to compare multiple task formulations with TimeSET by prompting open LLMs, i.e., Llama 2 and Flan-T5. Considering that identifying temporal orders of events is a core subtask in timeline construction, we further benchmark open LLMs on existing event temporal ordering datasets to gain a robust understanding of their capabilities. Our experiments show that (1) NLI formulation with Flan-T5 demonstrates a strong performance among others, while (2) timeline construction and event temporal ordering are still challenging tasks for few-shot LLMs. Our code and data are available at https://github.com/kimihiroh/timeset.
Paper Structure (58 sections, 1 equation, 8 figures, 14 tables)

This paper contains 58 sections, 1 equation, 8 figures, 14 tables.

Figures (8)

  • Figure 1: A comparison overview of the four formulations we study in this paper: NLI, Pairwise, MRC, and Timeline. A document with event annotation is converted into a different number of prompts following each formulation, and the predictions are interpreted as pairwise temporal orders in each formulation's manner to form a timeline.
  • Figure 2: Formulation comparison result with TimeSET. In each boxplot, one data point represents a combination of a prompt template and the number of demonstrations.
  • Figure 3: Analysis Overview
  • Figure 4: Annotation Interface
  • Figure 5: #demonstration
  • ...and 3 more figures