Table of Contents
Fetching ...

Effective Strategies for Asynchronous Software Engineering Agents

Jiayi Geng, Graham Neubig

Abstract

AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still pose challenges both with respect to accuracy, and with respect to timely completion. A natural approach to solving these long-horizon tasks in a timely manner is asynchronous multi-agent collaboration, where multiple agents work on different parts of the task at the same time. But effective application of multi-agent systems has proven surprisingly difficult: concurrent edits by multiple agents interfere with each other, dependencies are difficult to synchronize, and combining partial progress into a coherent whole is challenging. On the other hand, human developers have long relied on mature collaboration infrastructure to manage these challenges in large software projects. Inspired by these collaboration primitives, we introduce Centralized Asynchronous Isolated Delegation (CAID), a structured multi-agent coordination paradigm grounded in three core SWE primitives: centralized task delegation, asynchronous execution, and isolated workspaces. CAID constructs dependency-aware task plans through a central manager, executes subtasks concurrently in isolated workspaces, and consolidates progress via structured integration with executable test-based verification. In empirical evaluation, we find that CAID improves accuracy over single-agent baselines by 26.7% absolute on paper reproduction tasks (PaperBench) and 14.3% on Python library development tasks (Commit0). Through systematic analysis, we find that branch-and-merge is a central coordination mechanism for multi-agent collaboration, and that SWE primitives such as git worktree, git commit, and git merge enable it to be realized in a reliable and executable manner.

Effective Strategies for Asynchronous Software Engineering Agents

Abstract

AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still pose challenges both with respect to accuracy, and with respect to timely completion. A natural approach to solving these long-horizon tasks in a timely manner is asynchronous multi-agent collaboration, where multiple agents work on different parts of the task at the same time. But effective application of multi-agent systems has proven surprisingly difficult: concurrent edits by multiple agents interfere with each other, dependencies are difficult to synchronize, and combining partial progress into a coherent whole is challenging. On the other hand, human developers have long relied on mature collaboration infrastructure to manage these challenges in large software projects. Inspired by these collaboration primitives, we introduce Centralized Asynchronous Isolated Delegation (CAID), a structured multi-agent coordination paradigm grounded in three core SWE primitives: centralized task delegation, asynchronous execution, and isolated workspaces. CAID constructs dependency-aware task plans through a central manager, executes subtasks concurrently in isolated workspaces, and consolidates progress via structured integration with executable test-based verification. In empirical evaluation, we find that CAID improves accuracy over single-agent baselines by 26.7% absolute on paper reproduction tasks (PaperBench) and 14.3% on Python library development tasks (Commit0). Through systematic analysis, we find that branch-and-merge is a central coordination mechanism for multi-agent collaboration, and that SWE primitives such as git worktree, git commit, and git merge enable it to be realized in a reliable and executable manner.
Paper Structure (36 sections, 5 figures, 10 tables)

This paper contains 36 sections, 5 figures, 10 tables.

Figures (5)

  • Figure 2: CAID effectively utilizes iteration budgets. We compare the final score and the iteration utilization between single-agent runs with different iteration limits and CAID.
  • Figure 3: Effect of the number of engineer agents on runtime, pass rate, and cost for Commit0-Lite and PaperBench. We provide the single-agent baselines here for comparison.
  • Figure 4: Execution timelines on the minitorch repository for a single-agent run and two CAID runs. The bars in the Gantt plot indicate file-level implementation intervals and manager phases. The runs differ in which modules are assigned and actively developed, resulting in distinct execution trajectories and pass rates.
  • Figure 5: Runtime (s) vs. pass rate (%) of a subset of the Commit0 under three coordination prompts (1) Round-Manager Review: the manager reviews each round before integration; (2) Engineer Self-Verification: engineers verify locally without repeated managerial review; and (3) Efficiency-Prioritized: all agents are instructed to prioritize runtime efficiency.
  • Figure 6: Gantt plot on the simpy repository for CAID with different number of engineers, where $N=2,4,8$.