Toward Systems Foundations for Agentic Exploration
Jiakai Xu, Tianle Zhou, Eugene Wu, Kostis Kaffes
TL;DR
The paper addresses the need for fast, correct agentic exploration in LLM-powered agents operating in stateful environments. It contrasts pass@k baselines with real exploration and introduces three primitives for state restoration—prefix replay, snapshot/restore, and backtracking—and argues for a native forking paradigm with domain-specific hooks, where restoration latency can be reduced toward $O(1)$ and replay costs grow as $O( ext{len(prefix)})$. Benchmark results across six tools show latencies ranging from about $1.445$ s for CRIU on 2 GiB to over $12$ s for some Hybrid/Container-based approaches, with AWS VM snapshots being especially slow, underscoring the need for faster, more complete state capture. The authors propose fork-aware APIs and versioned side effects to enable microsecond branching and robust multi-agent exploration in production environments involving databases, browsers, and cloud APIs.
Abstract
Agentic exploration, letting LLM-powered agents branch, backtrack, and search across many execution paths, demands systems support well beyond today's pass-at-k resets. Our benchmark of six snapshot/restore mechanisms shows that generic tools such as CRIU or container commits are not fast enough even in isolated testbeds, and they crumble entirely in real deployments where agents share files, sockets, and cloud APIs with other agents and human users. In this talk, we pinpoint three open fundamental challenges: fork semantics, which concerns how branches reveal or hide tentative updates; external side-effects, where fork awareness must be added to services or their calls intercepted; and native forking, which requires cloning databases and runtimes in microseconds without bulk copying.
