Table of Contents
Fetching ...

HerAgent: Rethinking the Automated Environment Deployment via Hierarchical Test Pyramid

Xiang Li, Siyu Lu, Sarro Federica, Claire Le Goues, He Ye

TL;DR

HerAgent redefines automated environment deployment by introducing the Environment Maturity Hierarchy, which distinguishes Installability, Testability, and Runnability and treats execution evidence as the true success signal. The approach builds executable environments through a three-stage pipeline (Initial Bash File Generation, Test Pyramid Construction, and Interactive Environment Deployment) and uses a dual-loop repair mechanism to incrementally validate and repair the Bash File, guided by a formal state transition policy over maturity levels. Across four benchmarks and 14 languages, HerAgent achieves state-of-the-art effectiveness, including unique resolutions on many instances and strong gains in challenging C/C++ projects; ablation studies highlight the necessity of interactive feedback and hybrid repair for reaching full runnability. The work provides a principled, executable, and generalizable framework for automated environment configuration with practical impact on reproducibility, agent-based software engineering, and cross-language project deployment.

Abstract

Automated software environment setup is a prerequisite for testing, debugging, and reproducing failures, yet remains challenging in practice due to complex dependencies, heterogeneous build systems, and incomplete documentation. Recent work leverages large language models to automate this process, but typically evaluates success using weak signals such as dependency installation or partial test execution, which do not ensure that a project can actually run. In this paper, we argue that environment setup success should be evaluated through executable evidence rather than a single binary signal. We introduce the Environment Maturity Hierarchy, which defines three success levels based on progressively stronger execution requirements, culminating in successful execution of a project's main entry point. Guided by this hierarchy, we propose HerAgent, an automated environment setup approach that incrementally constructs executable environments through execution-based validation and repair. We evaluate HerAgent on four public benchmarks, where it outperforms all related work, achieving up to 79.6\% improvement due to its holistic understanding of project structure and dependencies. On complex C/C++ projects, HerAgent surpasses prior approaches by 66.7\%. In addition, HerAgent uniquely resolves 11-30 environment instances across the benchmarks that no prior method can configure.

HerAgent: Rethinking the Automated Environment Deployment via Hierarchical Test Pyramid

TL;DR

HerAgent redefines automated environment deployment by introducing the Environment Maturity Hierarchy, which distinguishes Installability, Testability, and Runnability and treats execution evidence as the true success signal. The approach builds executable environments through a three-stage pipeline (Initial Bash File Generation, Test Pyramid Construction, and Interactive Environment Deployment) and uses a dual-loop repair mechanism to incrementally validate and repair the Bash File, guided by a formal state transition policy over maturity levels. Across four benchmarks and 14 languages, HerAgent achieves state-of-the-art effectiveness, including unique resolutions on many instances and strong gains in challenging C/C++ projects; ablation studies highlight the necessity of interactive feedback and hybrid repair for reaching full runnability. The work provides a principled, executable, and generalizable framework for automated environment configuration with practical impact on reproducibility, agent-based software engineering, and cross-language project deployment.

Abstract

Automated software environment setup is a prerequisite for testing, debugging, and reproducing failures, yet remains challenging in practice due to complex dependencies, heterogeneous build systems, and incomplete documentation. Recent work leverages large language models to automate this process, but typically evaluates success using weak signals such as dependency installation or partial test execution, which do not ensure that a project can actually run. In this paper, we argue that environment setup success should be evaluated through executable evidence rather than a single binary signal. We introduce the Environment Maturity Hierarchy, which defines three success levels based on progressively stronger execution requirements, culminating in successful execution of a project's main entry point. Guided by this hierarchy, we propose HerAgent, an automated environment setup approach that incrementally constructs executable environments through execution-based validation and repair. We evaluate HerAgent on four public benchmarks, where it outperforms all related work, achieving up to 79.6\% improvement due to its holistic understanding of project structure and dependencies. On complex C/C++ projects, HerAgent surpasses prior approaches by 66.7\%. In addition, HerAgent uniquely resolves 11-30 environment instances across the benchmarks that no prior method can configure.
Paper Structure (27 sections, 5 equations, 8 figures, 3 tables)

This paper contains 27 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The Environment Maturity Hierarchy and Ecosystem Reality. Left: The three-stage maturity model. Right: Command distribution across 659 JVM and 324 Python repositories in EnvBench eliseeva2025envbench. The rose charts show the rich diversity of dependencies and test suites at the Installable, Testable, and Runnable levels.
  • Figure 2: Overview of HerAgent. The pipeline comprises: (1) Bash File Generation (yellow) to construct a initial script; (2) Test Pyramid Construction (blue) to retrieve and categorize test commands into the hierarchy; and (3) Interactive Environment Deployment (green), where a dual-loop repair mechanism iteratively validates and advances environment maturity.
  • Figure 3: An example of the Bash File template used in HerAgent consists of six steps. This Bash File can be executed directly in an isolated sandbox container.
  • Figure 4: Detailed process of Bash File Repair. The process iterates through: (1) initial execution to capture runtime errors; (2) analyzing the runtime errors; (3) generating single candidate bash command for repair; and (4) merging and integrating these commands into an updated Bash File for re-validation.
  • Figure 5: Venn diagram showing the complement and uniqueness of HerAgent on four benchmarks: EnvBench-Python, Repo2Run-Bench, ExecutionAgent-Bench and Installamatic-Bench.
  • ...and 3 more figures