Table of Contents
Fetching ...

Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization

Weiqi Wu, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, Hai Zhao

TL;DR

CHRONOS addresses open-domain news timeline summarization by combining iterative self-questioning with retrieval-augmented generation to construct an event graph around a topic. The method employs rounds of question generation, rewriting, and per-round timeline generation with a merging step to form coherent timelines, and introduces the Open-TLS dataset to enable up-to-date evaluation. Empirical results on Open-TLS and closed-domain TLS benchmarks show CHRONOS achieves competitive performance with improved efficiency, highlighting its scalability for real-world, large-scale timeline construction. The work advances TLS by integrating a human-like information-seeking loop with robust retrieval, offering practical impact for news understanding, search, and AI-assisted summarization.

Abstract

In the fast-changing realm of information, the capacity to construct coherent timelines from extensive event-related content has become increasingly significant and challenging. The complexity arises in aggregating related documents to build a meaningful event graph around a central topic. This paper proposes CHRONOS - Causal Headline Retrieval for Open-domain News Timeline SummarizatiOn via Iterative Self-Questioning, which offers a fresh perspective on the integration of Large Language Models (LLMs) to tackle the task of Timeline Summarization (TLS). By iteratively reflecting on how events are linked and posing new questions regarding a specific news topic to gather information online or from an offline knowledge base, LLMs produce and refresh chronological summaries based on documents retrieved in each round. Furthermore, we curate Open-TLS, a novel dataset of timelines on recent news topics authored by professional journalists to evaluate open-domain TLS where information overload makes it impossible to find comprehensive relevant documents from the web. Our experiments indicate that CHRONOS is not only adept at open-domain timeline summarization, but it also rivals the performance of existing state-of-the-art systems designed for closed-domain applications, where a related news corpus is provided for summarization.

Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization

TL;DR

CHRONOS addresses open-domain news timeline summarization by combining iterative self-questioning with retrieval-augmented generation to construct an event graph around a topic. The method employs rounds of question generation, rewriting, and per-round timeline generation with a merging step to form coherent timelines, and introduces the Open-TLS dataset to enable up-to-date evaluation. Empirical results on Open-TLS and closed-domain TLS benchmarks show CHRONOS achieves competitive performance with improved efficiency, highlighting its scalability for real-world, large-scale timeline construction. The work advances TLS by integrating a human-like information-seeking loop with robust retrieval, offering practical impact for news understanding, search, and AI-assisted summarization.

Abstract

In the fast-changing realm of information, the capacity to construct coherent timelines from extensive event-related content has become increasingly significant and challenging. The complexity arises in aggregating related documents to build a meaningful event graph around a central topic. This paper proposes CHRONOS - Causal Headline Retrieval for Open-domain News Timeline SummarizatiOn via Iterative Self-Questioning, which offers a fresh perspective on the integration of Large Language Models (LLMs) to tackle the task of Timeline Summarization (TLS). By iteratively reflecting on how events are linked and posing new questions regarding a specific news topic to gather information online or from an offline knowledge base, LLMs produce and refresh chronological summaries based on documents retrieved in each round. Furthermore, we curate Open-TLS, a novel dataset of timelines on recent news topics authored by professional journalists to evaluate open-domain TLS where information overload makes it impossible to find comprehensive relevant documents from the web. Our experiments indicate that CHRONOS is not only adept at open-domain timeline summarization, but it also rivals the performance of existing state-of-the-art systems designed for closed-domain applications, where a related news corpus is provided for summarization.
Paper Structure (39 sections, 1 equation, 4 figures, 11 tables)

This paper contains 39 sections, 1 equation, 4 figures, 11 tables.

Figures (4)

  • Figure 1: TLS of the news Banking Crisis. Edges between event nodes can be established by iterative self-questioning, ultimately building an event graph around the target news for timeline generation.
  • Figure 2: Pipeline of CHRONOS. Giving a target news, it first searches for general context and iteratively poses questions to retrieve more relevant news, while employing a divide-and-conquer strategy to generate the timeline.
  • Figure 3: Impact of rounds of Self-questioning on model performance within the Open-TLS dataset.
  • Figure 4: Topic analysis of CHRONOS on Open-TLS.