Repo2Run: Automated Building Executable Environment for Code Repository at Scale
Ruida Hu, Chao Peng, Xinchen Wang, Junjielong Xu, Cuiyun Gao
TL;DR
Repo2Run introduces the first LLM-based agent specifically designed to automate the construction of executable testing environments for code repositories at scale. It employs a dual-environment architecture with an internal build container and an external helper, plus a rollback-enabled workflow and a Dockerfile synthesizer to replay successful builds as runnable Dockerfiles. Evaluated on a benchmark of 420 Python repositories, it achieves 86.0% environment-building success and 100% Dockerfile viability, outperforming baselines by a wide margin. The approach reduces manual effort in environment provisioning, enabling scalable creation of reproducible code execution environments and facilitating large-scale software engineering data collection for modeling and research.
Abstract
Scaling up executable code data is significant for improving language models' software engineering capability. The intricate nature of the process makes it labor-intensive, time-consuming and expert-knowledge-dependent to build a large number of executable code repositories, limiting the scalability of existing work based on running tests. The primary bottleneck lies in the automated building of test environments for different repositories, which is an essential yet underexplored task. To mitigate the gap, we introduce Repo2Run, the first LLM-based agent aiming at automating the building of executable test environments for any repositories at scale. Specifically, given a code repository, Repo2Run iteratively builds the Docker image, runs unit tests based on the feedback of the building, and synthesizes the Dockerfile until the entire pipeline is executed successfully. The resulting Dockerfile can then be used to create Docker container environments for running code and tests. We created a benchmark containing 420 Python repositories with unit tests for evaluation. The results illustrate that Repo2Run achieves an 86.0% success rate, outperforming SWE-agent by 77.0%. The resources of Repo2Run are available at https://github.com/bytedance/Repo2Run.
