Table of Contents
Fetching ...

Unified Software Engineering Agent as AI Software Engineer

Leonhard Applis, Yuntong Zhang, Shanchao Liang, Nan Jiang, Lin Tan, Abhik Roychoudhury

TL;DR

This work introduces USEagent, a unified software engineering agent designed to orchestrate multiple SE tasks through a Meta-Agent that composes coarse-grained actions. To evaluate its capabilities, the authors build USEbench, a meta-benchmark unifying SWE-bench, SWT-bench, and REPOCOD, and compare USEagent against OpenHands CodeActAgent and AutoCodeRover across 1,271 tasks. Results show USEagent achieves higher overall efficacy than general agents, with strong performance on SWE maintenance tasks and meaningful gains when allowed retries (PASS@5), while revealing challenges such as edge-case handling and patch overfitting. The study highlights the potential of a unified AI Software Engineer and discusses future directions, including backtracking, back-end improvements, and cooperative human-AI development in software teams.

Abstract

The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenance and evolution of a project. In this context, the concept of LLM agents has gained traction, which utilize LLMs as reasoning engines to invoke external tools autonomously. But is an LLM agent the same as an AI software engineer? In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent. Unlike existing work which builds specialized agents for specific software tasks such as testing, debugging, and repair, our goal is to build a unified agent which can orchestrate and handle multiple capabilities. This gives the agent the promise of handling complex scenarios in software development such as fixing an incomplete patch, adding new features, or taking over code written by others. We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans. To evaluate the efficacy of USEagent, we build a Unified Software Engineering bench (USEbench) comprising of myriad tasks such as coding, testing, and patching. USEbench is a judicious mixture of tasks from existing benchmarks such as SWE-bench, SWT-bench, and REPOCOD. In an evaluation on USEbench consisting of 1,271 repository-level software engineering tasks, USEagent shows improved efficacy compared to existing general agents such as OpenHands CodeActAgent. There exist gaps in the capabilities of USEagent for certain coding tasks, which provides hints on further developing the AI Software Engineer of the future.

Unified Software Engineering Agent as AI Software Engineer

TL;DR

This work introduces USEagent, a unified software engineering agent designed to orchestrate multiple SE tasks through a Meta-Agent that composes coarse-grained actions. To evaluate its capabilities, the authors build USEbench, a meta-benchmark unifying SWE-bench, SWT-bench, and REPOCOD, and compare USEagent against OpenHands CodeActAgent and AutoCodeRover across 1,271 tasks. Results show USEagent achieves higher overall efficacy than general agents, with strong performance on SWE maintenance tasks and meaningful gains when allowed retries (PASS@5), while revealing challenges such as edge-case handling and patch overfitting. The study highlights the potential of a unified AI Software Engineer and discusses future directions, including backtracking, back-end improvements, and cooperative human-AI development in software teams.

Abstract

The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenance and evolution of a project. In this context, the concept of LLM agents has gained traction, which utilize LLMs as reasoning engines to invoke external tools autonomously. But is an LLM agent the same as an AI software engineer? In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent. Unlike existing work which builds specialized agents for specific software tasks such as testing, debugging, and repair, our goal is to build a unified agent which can orchestrate and handle multiple capabilities. This gives the agent the promise of handling complex scenarios in software development such as fixing an incomplete patch, adding new features, or taking over code written by others. We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans. To evaluate the efficacy of USEagent, we build a Unified Software Engineering bench (USEbench) comprising of myriad tasks such as coding, testing, and patching. USEbench is a judicious mixture of tasks from existing benchmarks such as SWE-bench, SWT-bench, and REPOCOD. In an evaluation on USEbench consisting of 1,271 repository-level software engineering tasks, USEagent shows improved efficacy compared to existing general agents such as OpenHands CodeActAgent. There exist gaps in the capabilities of USEagent for certain coding tasks, which provides hints on further developing the AI Software Engineer of the future.

Paper Structure

This paper contains 27 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Concept: Meta-Agent abstracting over actions.
  • Figure 2: Overview of AutoCodeRover. AutoCodeRover composes a fixed workflow for program maintenance tasks.
  • Figure 3: Overview of the USEagent and workflow. The Meta Agent chooses available actions, provides the state and retrieves a altered state until termination is chosen.
  • Figure 4: Comparison of SWT and SWE Step Distributions for USEagent on a significant subset of USEbench. Each bar shows how often different actions are invoked at a particular Meta-Agent step.
  • Figure 5: Average cost across Actions in USEagent.