Question Answering as Programming for Solving Time-Sensitive Questions

Xinyu Zhu; Cheng Yang; Bei Chen; Siheng Li; Jian-Guang Lou; Yujiu Yang

Question Answering as Programming for Solving Time-Sensitive Questions

Xinyu Zhu, Cheng Yang, Bei Chen, Siheng Li, Jian-Guang Lou, Yujiu Yang

TL;DR

The paper tackles the challenge of time-sensitive factual question answering, where answers depend on temporal constraints. It introduces QAaP, a two-phase approach that represents questions and context as structured code and solves QA tasks through programming, reinforced by Check and Match verification to mitigate LLM hallucinations. Empirical results across TimeQA, TempQuestions, and TimeQuestions show QAaP achieving substantial gains over strong baselines and approaching supervised SoTA, highlighting the effectiveness of code-based representations and verification in temporal reasoning. The work suggests a promising direction for robust, constraint-aware QA systems and points to future work on broader constraint types and human-in-the-loop verification.

Abstract

Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. However, due to the dynamic and ever-changing nature of real-world facts, the answer can be completely different when the time constraint in the question changes. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering, while our experiments reveal that the aforementioned problems still pose a significant challenge to existing LLMs. This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics. To overcome this limitation, rather than requiring LLMs to directly answer the question, we propose a novel approach where we reframe the $\textbf{Q}$uestion $\textbf{A}$nswering task $\textbf{a}$s $\textbf{P}$rogramming ($\textbf{QAaP}$). Concretely, by leveraging modern LLMs' superior capability in understanding both natural language and programming language, we endeavor to harness LLMs to represent diversely expressed text as well-structured code and select the best matching answer from multiple candidates through programming. We evaluate our QAaP framework on several time-sensitive question answering datasets and achieve decent improvement, up to $14.5$% over strong baselines. Our codes and data are available at https://github.com/TianHongZXY/qaap

Question Answering as Programming for Solving Time-Sensitive Questions

TL;DR

Abstract

uestion

nswering task

rogramming (

). Concretely, by leveraging modern LLMs' superior capability in understanding both natural language and programming language, we endeavor to harness LLMs to represent diversely expressed text as well-structured code and select the best matching answer from multiple candidates through programming. We evaluate our QAaP framework on several time-sensitive question answering datasets and achieve decent improvement, up to

% over strong baselines. Our codes and data are available at https://github.com/TianHongZXY/qaap

Paper Structure (26 sections, 1 equation, 6 figures, 7 tables)

This paper contains 26 sections, 1 equation, 6 figures, 7 tables.

Introduction
Related Work
LLMs augmented with tools
Reasoning with LLMs
Temporal reasoning
Method
Task definition
Represent all as codes
Choose answer through programming
Experiments
Experimental setup
Datasets
Baselines
Implement details
Main results
...and 11 more sections

Figures (6)

Figure 1: An example of a time-sensitive factual question from TimeQA timeqa: (a) illustrates the conventional process of question answering with LLMs, and (b) presents the proposed approach QAaP. The temporal information is colored with light blue and the potential answers are in green.
Figure 2: The whole framework of QAaP. Relevant and irrelevant temporal information is highlighted in light blue and blue, respectively. The correct answer is green; otherwise, it is red. Text related to potential answers is in bold.
Figure 3: Examples of correct and incorrect answers were obtained with different methods on TimeQA. Interference answers or wrong parts and time information not related to the question are highlighted in red and blue respectively. Correct answer and relevant time are in green and light blue. The text implies potential answers is in bold.
Figure 4: TimeQA prompt part 1.
Figure 5: TimeQA prompt part 2.
...and 1 more figures

Question Answering as Programming for Solving Time-Sensitive Questions

TL;DR

Abstract

Question Answering as Programming for Solving Time-Sensitive Questions

Authors

TL;DR

Abstract

Table of Contents

Figures (6)