Table of Contents
Fetching ...

Detecting Temporal Ambiguity in Questions

Bhawna Piryani, Abdelrahman Abdallah, Jamshid Mozafari, Adam Jatowt

TL;DR

The paper tackles temporal ambiguity in open-domain QA by introducing TempAmbiQA, a manually annotated dataset with $|TempAmbiQA| = 8{,}162$ questions that embed temporal context. It proposes a three-component framework—Disambiguation, Answer Equivalence Testing, and Search—to determine if a question is temporally ambiguous by generating disambiguated variants within a time frame $T = {t_1, t_2, ..., t_k}$ and comparing their answers. The authors benchmark diverse zero-shot and few-shot LLMs, plus a fine-tuned BERT, across multiple search strategies (Linear, Skip-List, Random, DAC) and report that Skip-List (2) achieves strong efficiency and competitive accuracy, with Qwen-110B often performing best overall. The dataset and conclusions aim to advance temporal IR/QA systems by enabling explicit detection of temporally ambiguous questions and guiding the development of time-aware QA methods.

Abstract

Detecting and answering ambiguous questions has been a challenging task in open-domain question answering. Ambiguous questions have different answers depending on their interpretation and can take diverse forms. Temporally ambiguous questions are one of the most common types of such questions. In this paper, we introduce TEMPAMBIQA, a manually annotated temporally ambiguous QA dataset consisting of 8,162 open-domain questions derived from existing datasets. Our annotations focus on capturing temporal ambiguity to study the task of detecting temporally ambiguous questions. We propose a novel approach by using diverse search strategies based on disambiguated versions of the questions. We also introduce and test non-search, competitive baselines for detecting temporal ambiguity using zero-shot and few-shot approaches.

Detecting Temporal Ambiguity in Questions

TL;DR

The paper tackles temporal ambiguity in open-domain QA by introducing TempAmbiQA, a manually annotated dataset with questions that embed temporal context. It proposes a three-component framework—Disambiguation, Answer Equivalence Testing, and Search—to determine if a question is temporally ambiguous by generating disambiguated variants within a time frame and comparing their answers. The authors benchmark diverse zero-shot and few-shot LLMs, plus a fine-tuned BERT, across multiple search strategies (Linear, Skip-List, Random, DAC) and report that Skip-List (2) achieves strong efficiency and competitive accuracy, with Qwen-110B often performing best overall. The dataset and conclusions aim to advance temporal IR/QA systems by enabling explicit detection of temporally ambiguous questions and guiding the development of time-aware QA methods.

Abstract

Detecting and answering ambiguous questions has been a challenging task in open-domain question answering. Ambiguous questions have different answers depending on their interpretation and can take diverse forms. Temporally ambiguous questions are one of the most common types of such questions. In this paper, we introduce TEMPAMBIQA, a manually annotated temporally ambiguous QA dataset consisting of 8,162 open-domain questions derived from existing datasets. Our annotations focus on capturing temporal ambiguity to study the task of detecting temporally ambiguous questions. We propose a novel approach by using diverse search strategies based on disambiguated versions of the questions. We also introduce and test non-search, competitive baselines for detecting temporal ambiguity using zero-shot and few-shot approaches.
Paper Structure (17 sections, 1 equation, 2 figures, 22 tables)

This paper contains 17 sections, 1 equation, 2 figures, 22 tables.

Figures (2)

  • Figure 1: Overview of different search strategies for detecting temporally ambiguous Questions. The Disambiguation Component generates questions DQ1 and DQk, referred to as Q1 and Q2 in the prompts, respectively. The Answer Equivalence Testing Component compares them, classifying Q as temporally ambiguous if the answer equivalence (Ak) is "No". If "Yes", the search proceeds to find the next valid year k' within the defined time range, generating the next disambiguation question DQk' to continue the classification process. If no valid k' is found, the question Q is classified as temporally unambiguous. A valid year k' is the one that falls within the specified time range (e.g., 2000-2024).
  • Figure 2: Efficiency of various search strategies.