Speculative Actions: A Lossless Framework for Faster Agentic Systems
Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng
TL;DR
The paper addresses the latency bottleneck in agent-environment loops caused by strictly sequential API calls. It introduces Speculative Actions, a lossless framework that pairs a fast Speculator with a slower, authoritative Actor to preemptively pursue likely next actions in parallel, enabling parallelization without sacrificing correctness. A formal analysis shows that, under reasonable assumptions, the end-to-end latency ratio converges to $1 - \frac{p}{1+p} \cdot \frac{\alpha}{\alpha+\beta}$ as $T \to \infty$, implying substantial speedups in ideal conditions; the framework also supports multi-step and uncertainty-aware extensions. Empirically, the approach yields meaningful time savings across chess, e-commerce dialogues, multi-hop web search, and an OS-tuning scenario, with practical guidance on model selection and safety mechanics. The work offers a general design principle—opportunistic parallelism in environment interactions—that moves toward real-time, scalable agentic systems and suggests future directions in hierarchical speculator-actor designs and reinforcement-learning-informed speculation.
Abstract
Despite growing interest in AI agents across industry and academia, their execution in an environment is often slow, hampering training, evaluation, and deployment. For example, a game of chess between two state-of-the-art agents may take hours. A critical bottleneck is that agent behavior unfolds sequentially: each action requires an API call, and these calls can be time-consuming. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, we propose speculative actions, a lossless framework for general agentic systems that predicts likely actions using faster models, enabling multiple steps to be executed in parallel. We evaluate this framework across three agentic environments: gaming, e-commerce, web search, and a "lossy" extension for an operating systems environment. In all cases, speculative actions achieve substantial accuracy in next-action prediction (up to 55%), translating into significant reductions in end-to-end latency. Moreover, performance can be further improved through stronger guessing models, top-K action prediction, multi-step speculation, and uncertainty-aware optimization, opening a promising path toward deploying low-latency agentic systems in the real world.
