Detecting Temporal Ambiguity in Questions
Bhawna Piryani, Abdelrahman Abdallah, Jamshid Mozafari, Adam Jatowt
TL;DR
The paper tackles temporal ambiguity in open-domain QA by introducing TempAmbiQA, a manually annotated dataset with $|TempAmbiQA| = 8{,}162$ questions that embed temporal context. It proposes a three-component framework—Disambiguation, Answer Equivalence Testing, and Search—to determine if a question is temporally ambiguous by generating disambiguated variants within a time frame $T = {t_1, t_2, ..., t_k}$ and comparing their answers. The authors benchmark diverse zero-shot and few-shot LLMs, plus a fine-tuned BERT, across multiple search strategies (Linear, Skip-List, Random, DAC) and report that Skip-List (2) achieves strong efficiency and competitive accuracy, with Qwen-110B often performing best overall. The dataset and conclusions aim to advance temporal IR/QA systems by enabling explicit detection of temporally ambiguous questions and guiding the development of time-aware QA methods.
Abstract
Detecting and answering ambiguous questions has been a challenging task in open-domain question answering. Ambiguous questions have different answers depending on their interpretation and can take diverse forms. Temporally ambiguous questions are one of the most common types of such questions. In this paper, we introduce TEMPAMBIQA, a manually annotated temporally ambiguous QA dataset consisting of 8,162 open-domain questions derived from existing datasets. Our annotations focus on capturing temporal ambiguity to study the task of detecting temporally ambiguous questions. We propose a novel approach by using diverse search strategies based on disambiguated versions of the questions. We also introduce and test non-search, competitive baselines for detecting temporal ambiguity using zero-shot and few-shot approaches.
