Table of Contents
Fetching ...

Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects

Louis Milliken, Sungmin Kang, Shin Yoo

TL;DR

The paper addresses the gap in automating environment management by evaluating LLM-based agents for automatically installing Python repositories. It introduces Installamatic, a two-stage agent that searches repository documentation and generates or repairs a Dockerfile, tested safely in virtual machines. Using a 40-repository benchmark, the study shows a 21/40 success rate (≈55% per broader claim) with an average installation rate of 28.8%, and finds strong links between documentation quality (visibility, informativity, recall) and installation success. Key insights include the value of a repair loop, the impact of extra installation complexity, and the need for robust documentation structure and alternative evaluation signals beyond test execution. The work offers practical guidance for developers and maintainers and lays groundwork for future repository-level environment-management agents.

Abstract

Many works have recently proposed the use of Large Language Model (LLM) based agents for performing `repository level' tasks, loosely defined as a set of tasks whose scopes are greater than a single file. This has led to speculation that the orchestration of these repository-level tasks could lead to software engineering agents capable of performing almost independently of human intervention. However, of the suite of tasks that would need to be performed by this autonomous software engineering agent, we argue that one important task is missing, which is to fulfil project level dependency by installing other repositories. To investigate the feasibility of this repository level installation task, we introduce a benchmark of of repository installation tasks curated from 40 open source Python projects, which includes a ground truth installation process for each target repository. Further, we propose Installamatic, an agent which aims to perform and verify the installation of a given repository by searching for relevant instructions from documentation in the repository. Empirical experiments reveal that that 55% of the studied repositories can be automatically installed by our agent at least one out of ten times. Through further analysis, we identify the common causes for our agent's inability to install a repository, discuss the challenges faced in the design and implementation of such an agent and consider the implications that such an agent could have for developers.

Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects

TL;DR

The paper addresses the gap in automating environment management by evaluating LLM-based agents for automatically installing Python repositories. It introduces Installamatic, a two-stage agent that searches repository documentation and generates or repairs a Dockerfile, tested safely in virtual machines. Using a 40-repository benchmark, the study shows a 21/40 success rate (≈55% per broader claim) with an average installation rate of 28.8%, and finds strong links between documentation quality (visibility, informativity, recall) and installation success. Key insights include the value of a repair loop, the impact of extra installation complexity, and the need for robust documentation structure and alternative evaluation signals beyond test execution. The work offers practical guidance for developers and maintainers and lays groundwork for future repository-level environment-management agents.

Abstract

Many works have recently proposed the use of Large Language Model (LLM) based agents for performing `repository level' tasks, loosely defined as a set of tasks whose scopes are greater than a single file. This has led to speculation that the orchestration of these repository-level tasks could lead to software engineering agents capable of performing almost independently of human intervention. However, of the suite of tasks that would need to be performed by this autonomous software engineering agent, we argue that one important task is missing, which is to fulfil project level dependency by installing other repositories. To investigate the feasibility of this repository level installation task, we introduce a benchmark of of repository installation tasks curated from 40 open source Python projects, which includes a ground truth installation process for each target repository. Further, we propose Installamatic, an agent which aims to perform and verify the installation of a given repository by searching for relevant instructions from documentation in the repository. Empirical experiments reveal that that 55% of the studied repositories can be automatically installed by our agent at least one out of ten times. Through further analysis, we identify the common causes for our agent's inability to install a repository, discuss the challenges faced in the design and implementation of such an agent and consider the implications that such an agent could have for developers.

Paper Structure

This paper contains 28 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Inspection of repository contents
  • Figure 2: Diagrams of Installamatic's processes
  • Figure 3: Identifying causes of un-installable repositories
  • Figure 4: Successful install rate for each repository, with and without perfect recall of relevant documents (purple bars represent the overlap between these two metrics).
  • Figure 5: Evaluating the visibility of a repository's documentation and its average installation rate
  • ...and 2 more figures