Measuring temporal effects of agent knowledge by date-controlled tool use

R. Patrick Xian; Qiming Cui; Stefan Bauer; Reza Abbasi-Asl

Measuring temporal effects of agent knowledge by date-controlled tool use

R. Patrick Xian, Qiming Cui, Stefan Bauer, Reza Abbasi-Asl

TL;DR

The paper investigates how temporal dynamics of external information influence LLM agents using web-search tools ($\mathcal{T}_t$) and introduces date-controlled tools and the SciBreak dataset. It shows how the masking ratio $\gamma \in \{0.5,0.75\}$ and a time-aware tool selection framework influence abstract completion across models GPT-3.5, GPT-4-turbo, and GPT-4o, with CoT prompting mitigating temporal degradation for high-capacity models. The results reveal that temporal shifts in external resources can degrade reliability, but appropriate model choice and temporal reasoning strategies can alleviate these effects. The study highlights implications for design, evaluation, and reproducibility of temporally aware agent systems.

Abstract

Temporal progression is an integral part of knowledge accumulation and update. Web search is frequently adopted as grounding for agent knowledge, yet an improper configuration affects the quality of the agent's responses. Here, we assess the agent behavior using distinct date-controlled tools (DCTs) as stress test to measure the knowledge variability of large language model (LLM) agents. We demonstrate the temporal effects of an LLM agent as a writing assistant, which uses web search to complete scientific publication abstracts. We show that the temporality of search engine translates into tool-dependent agent performance but can be alleviated with base model choice and explicit reasoning instructions such as chain-of-thought prompting. Our results indicate that agent design and evaluations should take a dynamical view and implement measures to account for the temporal influence of external resources to ensure reliability.

Measuring temporal effects of agent knowledge by date-controlled tool use

TL;DR

The paper investigates how temporal dynamics of external information influence LLM agents using web-search tools (

) and introduces date-controlled tools and the SciBreak dataset. It shows how the masking ratio

and a time-aware tool selection framework influence abstract completion across models GPT-3.5, GPT-4-turbo, and GPT-4o, with CoT prompting mitigating temporal degradation for high-capacity models. The results reveal that temporal shifts in external resources can degrade reliability, but appropriate model choice and temporal reasoning strategies can alleviate these effects. The study highlights implications for design, evaluation, and reproducibility of temporally aware agent systems.

Measuring temporal effects of agent knowledge by date-controlled tool use

TL;DR

Abstract

Measuring temporal effects of agent knowledge by date-controlled tool use

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)

Theorems & Definitions (3)