Table of Contents
Fetching ...

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering

Mehmet Kardan, Bhawna Piryani, Adam Jatowt

TL;DR

This work tackles the challenge of answering questions about past events by examining how temporal information can influence answer selection in diachronic QA. Using a BM25 retriever and a BERT reader on NYT-derived datasets ArchivalQA and TemporalQuestions, the authors compare non-temporal reranking, direct temporal reranking, and temporal grouping methods, with a mixed scoring scheme $ (1-\mu) S_{BM25} + \mu S_{BERT} $ where $\mu=0.5$. They find that non-temporal approaches often outperform temporal methods, yet temporal grouping (especially yearly) yields notable gains among simple temporal strategies, and explicit versus implicit temporal questions benefit from different ranking approaches. The results highlight the potential of hybrid and temporally informed reranking, and point to the value of combining temporal and non-temporal signals to improve robustness in open-domain QA over diachronic corpora. The study provides actionable guidance for designing temporal QA systems and motivates further evaluation on diverse, domain-specific datasets.

Abstract

Despite advancements in state-of-the-art models and information retrieval techniques, current systems still struggle to handle temporal information and to correctly answer detailed questions about past events. In this paper, we investigate the impact of temporal characteristics of answers in Question Answering (QA) by exploring several simple answer selection techniques. Our findings emphasize the role of temporal features in selecting the most relevant answers from diachronic document collections and highlight differences between explicit and implicit temporal questions.

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering

TL;DR

This work tackles the challenge of answering questions about past events by examining how temporal information can influence answer selection in diachronic QA. Using a BM25 retriever and a BERT reader on NYT-derived datasets ArchivalQA and TemporalQuestions, the authors compare non-temporal reranking, direct temporal reranking, and temporal grouping methods, with a mixed scoring scheme where . They find that non-temporal approaches often outperform temporal methods, yet temporal grouping (especially yearly) yields notable gains among simple temporal strategies, and explicit versus implicit temporal questions benefit from different ranking approaches. The results highlight the potential of hybrid and temporally informed reranking, and point to the value of combining temporal and non-temporal signals to improve robustness in open-domain QA over diachronic corpora. The study provides actionable guidance for designing temporal QA systems and motivates further evaluation on diverse, domain-specific datasets.

Abstract

Despite advancements in state-of-the-art models and information retrieval techniques, current systems still struggle to handle temporal information and to correctly answer detailed questions about past events. In this paper, we investigate the impact of temporal characteristics of answers in Question Answering (QA) by exploring several simple answer selection techniques. Our findings emphasize the role of temporal features in selecting the most relevant answers from diachronic document collections and highlight differences between explicit and implicit temporal questions.

Paper Structure

This paper contains 9 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the answer selection process applied in our analysis.
  • Figure 2: An example of the temporal positioning of candidate answers, where the dotted line marks the division between years on the timeline, and the dotted box indicates here on particular month.