Table of Contents
Fetching ...

AutoDev: Automated AI-Driven Development

Michele Tufano, Anisha Agarwal, Jinu Jang, Roshanak Zilouchian Moghaddam, Neel Sundaresan

TL;DR

AutoDev presents a fully automated AI-driven development framework that empowers autonomous AI agents to perform end-to-end software engineering tasks within a secure, Docker-contained environment. By integrating a Conversation Manager, Tools Library, Agent Scheduler, and Evaluation Environment, AutoDev enables agents to edit code, build, test, retrieve context, and manage git operations without developer intervention. Empirical evaluation on the HumanEval dataset shows strong code generation (Pass@1 = 91.5%) and test generation (Pass@1 = 87.8%) performance, with high test coverage and measurable task efficiency. The work demonstrates a significant step toward secure, user-controlled AI-driven development and outlines concrete plans for IDE and CI/CD integrations to broaden real-world impact.

Abstract

The landscape of software development has witnessed a paradigm shift with the advent of AI-powered assistants, exemplified by GitHub Copilot. However, existing solutions are not leveraging all the potential capabilities available in an IDE such as building, testing, executing code, git operations, etc. Therefore, they are constrained by their limited capabilities, primarily focusing on suggesting code snippets and file manipulation within a chat-based interface. To fill this gap, we present AutoDev, a fully automated AI-driven software development framework, designed for autonomous planning and execution of intricate software engineering tasks. AutoDev enables users to define complex software engineering objectives, which are assigned to AutoDev's autonomous AI Agents to achieve. These AI agents can perform diverse operations on a codebase, including file editing, retrieval, build processes, execution, testing, and git operations. They also have access to files, compiler output, build and testing logs, static analysis tools, and more. This enables the AI Agents to execute tasks in a fully automated manner with a comprehensive understanding of the contextual information required. Furthermore, AutoDev establishes a secure development environment by confining all operations within Docker containers. This framework incorporates guardrails to ensure user privacy and file security, allowing users to define specific permitted or restricted commands and operations within AutoDev. In our evaluation, we tested AutoDev on the HumanEval dataset, obtaining promising results with 91.5% and 87.8% of Pass@1 for code generation and test generation respectively, demonstrating its effectiveness in automating software engineering tasks while maintaining a secure and user-controlled development environment.

AutoDev: Automated AI-Driven Development

TL;DR

AutoDev presents a fully automated AI-driven development framework that empowers autonomous AI agents to perform end-to-end software engineering tasks within a secure, Docker-contained environment. By integrating a Conversation Manager, Tools Library, Agent Scheduler, and Evaluation Environment, AutoDev enables agents to edit code, build, test, retrieve context, and manage git operations without developer intervention. Empirical evaluation on the HumanEval dataset shows strong code generation (Pass@1 = 91.5%) and test generation (Pass@1 = 87.8%) performance, with high test coverage and measurable task efficiency. The work demonstrates a significant step toward secure, user-controlled AI-driven development and outlines concrete plans for IDE and CI/CD integrations to broaden real-world impact.

Abstract

The landscape of software development has witnessed a paradigm shift with the advent of AI-powered assistants, exemplified by GitHub Copilot. However, existing solutions are not leveraging all the potential capabilities available in an IDE such as building, testing, executing code, git operations, etc. Therefore, they are constrained by their limited capabilities, primarily focusing on suggesting code snippets and file manipulation within a chat-based interface. To fill this gap, we present AutoDev, a fully automated AI-driven software development framework, designed for autonomous planning and execution of intricate software engineering tasks. AutoDev enables users to define complex software engineering objectives, which are assigned to AutoDev's autonomous AI Agents to achieve. These AI agents can perform diverse operations on a codebase, including file editing, retrieval, build processes, execution, testing, and git operations. They also have access to files, compiler output, build and testing logs, static analysis tools, and more. This enables the AI Agents to execute tasks in a fully automated manner with a comprehensive understanding of the contextual information required. Furthermore, AutoDev establishes a secure development environment by confining all operations within Docker containers. This framework incorporates guardrails to ensure user privacy and file security, allowing users to define specific permitted or restricted commands and operations within AutoDev. In our evaluation, we tested AutoDev on the HumanEval dataset, obtaining promising results with 91.5% and 87.8% of Pass@1 for code generation and test generation respectively, demonstrating its effectiveness in automating software engineering tasks while maintaining a secure and user-controlled development environment.
Paper Structure (23 sections, 5 figures, 2 tables)

This paper contains 23 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: AutoDev enables an AI Agent to achieve a given objective by performing several actions within the repository. The Eval Environment executes the suggested operations, providing the AI Agent with the resulting outcome. In the conversation, purple messages are from the AI agent, while blue messages are responses from the Eval Environment.
  • Figure 2: Overview of the AutoDev Framework: The user initiates the process by defining the objective to be achieved. The Conversation Manager initializes the conversation and settings. The Agent Scheduler orchestrates AI agents to collaborate on the task and forwards their commands to the Conversation Manager. The Conversation Manager parses these commands and invokes the Tools Library, which offers various actions that can be performed on the repository. Agents' actions are executed within a secure Docker environment, and the output is returned to the Conversation Manager, which incorporates it into the ongoing conversation. This iterative process continues until the task is successfully completed.
  • Figure 3: Cumulative number of commands used by AutoDev for an average task of Code and Test Generation
  • Figure 4: AutoDev in Test Generation scenario (part I)
  • Figure 5: AutoDev in Test Generation scenario (part II)