Table of Contents
Fetching ...

Let the Agent Search: Autonomous Exploration Beats Rigid Workflows in Temporal Question Answering

Xufei Lv, Jiahui Yang, Yifu Gao, Linbo Qiao, Houde Liu

TL;DR

AT2QA is proposed, an autonomous, training-free agent for temporal question answering that iteratively interacts with the temporal knowledge graph via a general search tool for dynamic retrieval.

Abstract

Temporal Knowledge Graph Question Answering (TKGQA) demands multi-hop reasoning under temporal constraints. Prior approaches based on large language models (LLMs) typically rely on rigid, hand-crafted retrieval workflows or costly supervised fine-tuning. We show that simply granting an off-the-shelf LLM autonomy, that is, letting it decide what to do next, already yields substantial gains even in a strict zero-shot setting. Building on this insight, we propose AT2QA, an autonomous, training-free agent for temporal question answering that iteratively interacts with the temporal knowledge graph via a general search tool for dynamic retrieval. Experiments on MultiTQ demonstrate large improvements: AT2QA achieves 88.7% Hits@1 (+10.7% over prior SOTA), including a +20.1% gain on challenging multi-target queries, showing that agentic autonomy can decisively outperform fine-tuning for temporal question answering. Code and the full set of sampled trajectories are available on https://github.com/AT2QA-Official-Code/AT2QA-Official-Code

Let the Agent Search: Autonomous Exploration Beats Rigid Workflows in Temporal Question Answering

TL;DR

AT2QA is proposed, an autonomous, training-free agent for temporal question answering that iteratively interacts with the temporal knowledge graph via a general search tool for dynamic retrieval.

Abstract

Temporal Knowledge Graph Question Answering (TKGQA) demands multi-hop reasoning under temporal constraints. Prior approaches based on large language models (LLMs) typically rely on rigid, hand-crafted retrieval workflows or costly supervised fine-tuning. We show that simply granting an off-the-shelf LLM autonomy, that is, letting it decide what to do next, already yields substantial gains even in a strict zero-shot setting. Building on this insight, we propose AT2QA, an autonomous, training-free agent for temporal question answering that iteratively interacts with the temporal knowledge graph via a general search tool for dynamic retrieval. Experiments on MultiTQ demonstrate large improvements: AT2QA achieves 88.7% Hits@1 (+10.7% over prior SOTA), including a +20.1% gain on challenging multi-target queries, showing that agentic autonomy can decisively outperform fine-tuning for temporal question answering. Code and the full set of sampled trajectories are available on https://github.com/AT2QA-Official-Code/AT2QA-Official-Code
Paper Structure (36 sections, 1 equation, 8 figures, 4 tables)

This paper contains 36 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Comparison between AT2QA and existing methods. (a) Traditional Embedding Methods rely on static vector representations, lacking semantic understanding. (b) Static LLM Workflows decompose questions through rigid, predefined pipelines; an initial retrieval failure inevitably cascades through the subsequent steps (Error Propagation) due to the absence of autonomy. (c)AT2QA empowers the LLM as a fully autonomous agent. Through iterative environment interaction, the agent can dynamically verify evidence and self-correct its reasoning trajectory, effectively overcoming the bottleneck of static workflows.
  • Figure 2: Performance comparison on the MultiTQ benchmark. In a zero-shot setting, the autonomous LLM agent surpasses existing supervised baselines, highlighting the efficacy of unlocking the model's inherent decision-making capabilities.
  • Figure 3: Pass@k analysis of our method. In a zero-shot setting, our method achieves >84% Pass@1 accuracy. For difficult queries that initially fail, repeated sampling ($k=10$) successfully retrieves the correct answer, suggesting the reasoning capability is present but dormant.
  • Figure 4: The overview of our proposed framework AT2QA. Top: At inference, an LLM agent repeatedly queries a Search tool to interact with the TKG environment until sufficient evidence is collected. Inputs include a system prompt, the question, and few-shot demonstrations. Bottom: The few-shot library is selected from candidate trajectories via training-free GRPO-style rule editing with rule-based rewards.
  • Figure 5: Cumulative Density of First Gold-Fact Position for complex multiple-target queries. The distribution provides quantitative proof of the agent's self-validation (left) and self-correction (right) capabilities.
  • ...and 3 more figures