Table of Contents
Fetching ...

EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths

Zhening Li, Armando Solar-Lezama, Yisong Yue, Stephan Zheng

TL;DR

EnCompass introduces Probabilistic Angelic Nondeterminism (PAN) to decouple agent workflow from inference-time search strategies, enabling easy experimentation with strategies like beam search, best-of-N, refinement, and self-consistency within a unified framework. The Python-based EnCompass library compiles agent workflows into a searchable execution graph and exposes primitives, a decorator-based compiler, and multiple search algorithms, plus support for custom strategies. Case studies demonstrate that strategically applied search outperforms simpler baselines and reduces coding complexity compared to traditional state-machine approaches. Overall, the work provides a practical pathway to scalable inference-time planning in program-in-control agents and invites further exploration of powerful search-driven paradigms for reliable AI systems.

Abstract

We introduce a new approach to agent programming, the development of LLM-based agents. Current approaches to agent programming often entangle two aspects of agent design: the core workflow logic and the inference-time strategy (e.g., tree search). We introduce "probabilistic angelic nondeterminism" ("PAN"), a programming model that disentangles these two concerns, allowing the programmer to describe the agent workflow and independently experiment with different inference-time strategies by simply changing a few inputs. We provide an implementation of PAN in Python as the EnCompass framework, which uses a Python decorator to compile agent workflow programs into a search space. We present three case studies that demonstrate how the framework lets the programmer quickly improve the reliability of an agent and easily switch between different inference-time strategies, all with little additional coding.

EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths

TL;DR

EnCompass introduces Probabilistic Angelic Nondeterminism (PAN) to decouple agent workflow from inference-time search strategies, enabling easy experimentation with strategies like beam search, best-of-N, refinement, and self-consistency within a unified framework. The Python-based EnCompass library compiles agent workflows into a searchable execution graph and exposes primitives, a decorator-based compiler, and multiple search algorithms, plus support for custom strategies. Case studies demonstrate that strategically applied search outperforms simpler baselines and reduces coding complexity compared to traditional state-machine approaches. Overall, the work provides a practical pathway to scalable inference-time planning in program-in-control agents and invites further exploration of powerful search-driven paradigms for reliable AI systems.

Abstract

We introduce a new approach to agent programming, the development of LLM-based agents. Current approaches to agent programming often entangle two aspects of agent design: the core workflow logic and the inference-time strategy (e.g., tree search). We introduce "probabilistic angelic nondeterminism" ("PAN"), a programming model that disentangles these two concerns, allowing the programmer to describe the agent workflow and independently experiment with different inference-time strategies by simply changing a few inputs. We provide an implementation of PAN in Python as the EnCompass framework, which uses a Python decorator to compile agent workflow programs into a search space. We present three case studies that demonstrate how the framework lets the programmer quickly improve the reliability of an agent and easily switch between different inference-time strategies, all with little additional coding.

Paper Structure

This paper contains 51 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: An EnCompass program specifies an agent workflow, which is compiled into a search space object, and inference-time scaling is accomplished through search over the nondeterministic execution paths of the agent workflow.
  • Figure 2: Results of using EnCompass to apply different inference-time scaling methods to the code repository translation agent. All error bars show standard errors of the mean over 5 runs. (a) A comprehensive hyperparameter search for ps0; (b) For ps1 to ps4, we applying global best-of-$N$ ("GBoN"), file-level local best-of-$N$ ("LBoN (c.)"), and beam search at the file and method level ("beam (c.) + beam (f.)") while controlling for cost.