Table of Contents
Fetching ...

Learning to Wait: Synchronizing Agents with the Physical World

Yifei She, Ping Zhang, He Liu, Yanmin Jia, Yang Jing, Zijun Liu, Peng Sun, Xiangbin Li, Xiaohe Hu

TL;DR

The paper tackles the Temporal Gap between agent actions and delayed feedback in real-world, asynchronous environments. It argues that an Agent-side approach—extending Code-as-Action to the temporal domain and leveraging semantic priors with In-Context Learning—can predict precise waiting durations and synchronize internal clocks with the physical world, avoiding costly environment-side polling. Validation occurs in a simulated Kubernetes-like setting with Gamma-distributed latencies, where Inter-Episode History Feedback enables the LLMs to progressively calibrate their timing. Results show that LLMs can learn to align their Cognitive Timeline with environmental latency, with model-dependent dynamics, suggesting this temporal awareness is a learnable capability essential for autonomous, self-evolving agents in open-ended environments.

Abstract

Real-world agentic tasks, unlike synchronous Markov Decision Processes (MDPs), often involve non-blocking actions with variable latencies, creating a fundamental \textit{Temporal Gap} between action initiation and completion. Existing environment-side solutions, such as blocking wrappers or frequent polling, either limit scalability or dilute the agent's context window with redundant observations. In this work, we propose an \textbf{Agent-side Approach} that empowers Large Language Models (LLMs) to actively align their \textit{Cognitive Timeline} with the physical world. By extending the Code-as-Action paradigm to the temporal domain, agents utilize semantic priors and In-Context Learning (ICL) to predict precise waiting durations (\texttt{time.sleep(t)}), effectively synchronizing with asynchronous environment without exhaustive checking. Experiments in a simulated Kubernetes cluster demonstrate that agents can precisely calibrate their internal clocks to minimize both query overhead and execution latency, validating that temporal awareness is a learnable capability essential for autonomous evolution in open-ended environments.

Learning to Wait: Synchronizing Agents with the Physical World

TL;DR

The paper tackles the Temporal Gap between agent actions and delayed feedback in real-world, asynchronous environments. It argues that an Agent-side approach—extending Code-as-Action to the temporal domain and leveraging semantic priors with In-Context Learning—can predict precise waiting durations and synchronize internal clocks with the physical world, avoiding costly environment-side polling. Validation occurs in a simulated Kubernetes-like setting with Gamma-distributed latencies, where Inter-Episode History Feedback enables the LLMs to progressively calibrate their timing. Results show that LLMs can learn to align their Cognitive Timeline with environmental latency, with model-dependent dynamics, suggesting this temporal awareness is a learnable capability essential for autonomous, self-evolving agents in open-ended environments.

Abstract

Real-world agentic tasks, unlike synchronous Markov Decision Processes (MDPs), often involve non-blocking actions with variable latencies, creating a fundamental \textit{Temporal Gap} between action initiation and completion. Existing environment-side solutions, such as blocking wrappers or frequent polling, either limit scalability or dilute the agent's context window with redundant observations. In this work, we propose an \textbf{Agent-side Approach} that empowers Large Language Models (LLMs) to actively align their \textit{Cognitive Timeline} with the physical world. By extending the Code-as-Action paradigm to the temporal domain, agents utilize semantic priors and In-Context Learning (ICL) to predict precise waiting durations (\texttt{time.sleep(t)}), effectively synchronizing with asynchronous environment without exhaustive checking. Experiments in a simulated Kubernetes cluster demonstrate that agents can precisely calibrate their internal clocks to minimize both query overhead and execution latency, validating that temporal awareness is a learnable capability essential for autonomous evolution in open-ended environments.

Paper Structure

This paper contains 18 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The Temporal Alignment Problem. The asynchronous nature of real-world environments creates a discrepancy between the Physical Timeline and the Agent's Cognitive Timeline. While (a) Periodic Check forces alignment at prohibitively high query costs, (b) General Agent Tasks (e.g., Coding, Math) have obscured this challenge, as the agent's generation time naturally spans the task duration ($T_{\text{act}} \approx T_{\text{true}}$), leaving no temporal gap to manage, (c) Our Approach actively predicts an optimal $T_{\text{sleep}}$ to synchronize the agent's internal clock with physical latency, minimizing misalignment without redundant queries.
  • Figure 2: A comparison of Regret Scores for four representative LLMs. The score quantifies the agent's efficiency, where a lower value indicates better performance.
  • Figure 3: Analysis of the temporal prediction error ($T_{\text{confirm}} - T_{\text{true}}$) over episodes. The annotation $N=1$ on the data points indicates the model only checks once.
  • Figure 4: Probability density functions of the simulated task latencies. We model $T_{\text{true}}$ using Gamma distributions to capture the multi-stage nature of asynchronous operations.
  • Figure 5: Temporal Adaptation Dynamics of kimi-k2-0905. (a) The significant reduction in Regret Score demonstrates the agent's rapid convergence towards an optimal checking strategy. (b) The Time Difference metric reveals how the agent minimizes the Temporal Gap, effectively synchronizing its Cognitive Timeline with the asynchronous physical latency. Annotations ($N=x$) indicate the check count for specific episodes.