Table of Contents
Fetching ...

Toward Systems Foundations for Agentic Exploration

Jiakai Xu, Tianle Zhou, Eugene Wu, Kostis Kaffes

TL;DR

The paper addresses the need for fast, correct agentic exploration in LLM-powered agents operating in stateful environments. It contrasts pass@k baselines with real exploration and introduces three primitives for state restoration—prefix replay, snapshot/restore, and backtracking—and argues for a native forking paradigm with domain-specific hooks, where restoration latency can be reduced toward $O(1)$ and replay costs grow as $O( ext{len(prefix)})$. Benchmark results across six tools show latencies ranging from about $1.445$ s for CRIU on 2 GiB to over $12$ s for some Hybrid/Container-based approaches, with AWS VM snapshots being especially slow, underscoring the need for faster, more complete state capture. The authors propose fork-aware APIs and versioned side effects to enable microsecond branching and robust multi-agent exploration in production environments involving databases, browsers, and cloud APIs.

Abstract

Agentic exploration, letting LLM-powered agents branch, backtrack, and search across many execution paths, demands systems support well beyond today's pass-at-k resets. Our benchmark of six snapshot/restore mechanisms shows that generic tools such as CRIU or container commits are not fast enough even in isolated testbeds, and they crumble entirely in real deployments where agents share files, sockets, and cloud APIs with other agents and human users. In this talk, we pinpoint three open fundamental challenges: fork semantics, which concerns how branches reveal or hide tentative updates; external side-effects, where fork awareness must be added to services or their calls intercepted; and native forking, which requires cloning databases and runtimes in microseconds without bulk copying.

Toward Systems Foundations for Agentic Exploration

TL;DR

The paper addresses the need for fast, correct agentic exploration in LLM-powered agents operating in stateful environments. It contrasts pass@k baselines with real exploration and introduces three primitives for state restoration—prefix replay, snapshot/restore, and backtracking—and argues for a native forking paradigm with domain-specific hooks, where restoration latency can be reduced toward and replay costs grow as . Benchmark results across six tools show latencies ranging from about s for CRIU on 2 GiB to over s for some Hybrid/Container-based approaches, with AWS VM snapshots being especially slow, underscoring the need for faster, more complete state capture. The authors propose fork-aware APIs and versioned side effects to enable microsecond branching and robust multi-agent exploration in production environments involving databases, browsers, and cloud APIs.

Abstract

Agentic exploration, letting LLM-powered agents branch, backtrack, and search across many execution paths, demands systems support well beyond today's pass-at-k resets. Our benchmark of six snapshot/restore mechanisms shows that generic tools such as CRIU or container commits are not fast enough even in isolated testbeds, and they crumble entirely in real deployments where agents share files, sockets, and cloud APIs with other agents and human users. In this talk, we pinpoint three open fundamental challenges: fork semantics, which concerns how branches reveal or hide tentative updates; external side-effects, where fork awareness must be added to services or their calls intercepted; and native forking, which requires cloning databases and runtimes in microseconds without bulk copying.

Paper Structure

This paper contains 8 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: An LLM agent (orange) explores by taking different actions, creating a branching tree in which every node represents a distinct state of the environment. A prefix replay (teal) starts from root and replay all commands on record; A snapshot/restore (red) checkpoint method restores the state directly to the target; a backtracking method (purple) goes through all intermediate nodes to the target node.