Table of Contents
Fetching ...

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, Yu Cheng

TL;DR

The paper tackles the challenge of resolving GitHub issues at the repository level, where context and code changes span multiple files. It introduces MAGIS, an LLM-based multi-agent framework with dedicated roles (Manager, Repository Custodian, Developer, QA Engineer) to plan, locate, modify, and review code changes collaboratively. Through SWE-bench experiments, MAGIS with GPT-4 achieves a 13.94% resolved rate, an eight-fold improvement over the direct GPT-4 baseline, and analysis highlights the critical roles of file/line localization and structured planning. The work demonstrates that decomposing repository-level tasks into coordinated agent activities significantly enhances automated software evolution workflows and offers a blueprint for integrating LLMs into real-world development pipelines.

Abstract

In software development, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing code. Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving Github issues, particularly at the repository level. To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, significantly outperforming the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the advanced LLM.

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

TL;DR

The paper tackles the challenge of resolving GitHub issues at the repository level, where context and code changes span multiple files. It introduces MAGIS, an LLM-based multi-agent framework with dedicated roles (Manager, Repository Custodian, Developer, QA Engineer) to plan, locate, modify, and review code changes collaboratively. Through SWE-bench experiments, MAGIS with GPT-4 achieves a 13.94% resolved rate, an eight-fold improvement over the direct GPT-4 baseline, and analysis highlights the critical roles of file/line localization and structured planning. The work demonstrates that decomposing repository-level tasks into coordinated agent activities significantly enhances automated software evolution workflows and offers a blueprint for integrating LLMs into real-world development pipelines.

Abstract

In software development, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing code. Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving Github issues, particularly at the repository level. To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, significantly outperforming the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the advanced LLM.
Paper Structure (39 sections, 4 equations, 19 figures, 6 tables, 3 algorithms)

This paper contains 39 sections, 4 equations, 19 figures, 6 tables, 3 algorithms.

Figures (19)

  • Figure 1: The comparison of line locating coverage ratio between three LLMs. The vertical axis representing the frequency of the range of line locating coverage ratio for each group, and the horizontal axis representing the coverage ratio.
  • Figure 2: Overview of our framework, MAGIS. The detailed version can be found in Fig.\ref{['fig:detailed_overview']}.
  • Figure 3: Comparison of recall scores between Ours and BM25.
  • Figure 4: Distribution of the correlation score between the generated task description and the reference code change.
  • Figure 5: Comparison of line locating coverage between MAGIS (Ours) and baselines.
  • ...and 14 more figures