Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Davide Di Gioia

Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Davide Di Gioia

Abstract

Autonomous agents operating in continuous environments must decide not only what to do, but when to act. We introduce a lightweight adaptive temporal control system that learns the optimal interval between cognitive ticks from experience, replacing ad hoc biologically inspired timers with a principled learned policy. The policy state is augmented with a predictive hyperbolic spread signal (a "curvature signal" shorthand) derived from hyperbolic geometry: the mean pairwise Poincare distance among n sampled futures embedded in the Poincare ball. High spread indicates a branching, uncertain future and drives the agent to act sooner; low spread signals predictability and permits longer rest intervals. We further propose an interval-aware reward that explicitly penalises inefficiency relative to the chosen wait time, correcting a systematic credit-assignment failure of naive outcome-based rewards in timing problems. We additionally introduce a joint spatio-temporal embedding (ATCPG-ST) that concatenates independently normalised state and position projections in the Poincare ball; spatial trajectory divergence provides an independent timing signal unavailable to the state-only variant (ATCPG-SO). This extension raises mean hyperbolic spread (kappa) from 1.88 to 3.37 and yields a further 5.8 percent efficiency gain over the state-only baseline. Ablation experiments across five random seeds demonstrate that (i) learning is the dominant efficiency factor (54.8 percent over no-learning), (ii) hyperbolic spread provides significant complementary gain (26.2 percent over geometry-free control), (iii) the combined system achieves 22.8 percent efficiency over the fixed-interval baseline, and (iv) adding spatial position information to the spread embedding yields an additional 5.8 percent.

Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Abstract

Paper Structure (68 sections, 4 theorems, 25 equations, 1 figure, 6 tables)

This paper contains 68 sections, 4 theorems, 25 equations, 1 figure, 6 tables.

Introduction
Related Work
Decentralised orchestration and emergent synchronisation.
Temporal abstraction in RL.
Reward shaping for timing.
Hyperbolic representation and uncertainty.
Structural blind spots in agent orchestration.
Autonomous cognitive loops and self-directed timing.
Problem Formulation
Adaptive Cognitive Pacing as a Contextual Bandit
State Representation
Learned Pacing Policy
Linear Policy
Online Weight Update
Interpretation of the update rule.
...and 53 more sections

Key Result

Proposition 1

Under the online linear update eq:reinforce with naive reward $\tilde{r}_t = \Delta w_t$ and feature $f_t > 0$, the update $\theta_f \mathrel{+}= \alpha \tilde{r}_t f_t$ decreases $\theta_f$ when $\tilde{r}_t < 0$ (overload), producing shorter intervals when the agent is fatigued, opposite to the de

Figures (1)

Figure 1: A general adaptive cognitive pacing loop. While instantiated here in reinforcement learning, the same structure applies to agents that dynamically allocate computation, planning, or tool use.

Theorems & Definitions (11)

Proposition 1: Reward-direction failure
proof
Definition 1: Predictive Hyperbolic Spread
Remark 1
Proposition 2: Zero curvature for identical futures
proof
Proposition 3: Three-regime amplification
proof
Remark 2
Proposition 4: Spatial monotonicity, non-saturated regime
...and 1 more

Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Abstract

Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (11)