Table of Contents
Fetching ...

CodeV: Issue Resolving with Visual Data

Linhao Zhang, Daoguang Zan, Quanshun Yang, Zhirong Huang, Dong Chen, Bo Shen, Tianyu Liu, Yongshun Gong, Pengjie Huang, Xudong Lu, Guangtai Liang, Lizhen Cui, Qianxiang Wang

TL;DR

GitHub issue resolving has largely relied on textual signals, neglecting visual data that can convey crucial context. CodeV introduces a two-phase multimodal framework that first converts issue visuals into fine-grained descriptions and a structured summary, then uses this enriched representation to generate patches via LLMs, achieving strong gains over text-only baselines. The authors also provide Visual SWE-bench, a 133-instance benchmark across 11 repositories to evaluate visual issue resolving, and demonstrate substantial improvements (e.g., substantial relative gains over Agentless) with robust performance across varying Vision-Language Model sizes. This work highlights the practical value of visual data in software repair tasks and offers a standardized benchmark to propel future multimodal approaches in code-related AI systems.

Abstract

Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data. However, this visual data is crucial for resolving issues as it conveys additional knowledge that text alone cannot. We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of LLMs. CodeV resolves each issue by following a two-phase process: data processing and patch generation. To evaluate CodeV, we construct a benchmark for visual issue resolving, namely Visual SWE-bench. Through extensive experiments, we demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.

CodeV: Issue Resolving with Visual Data

TL;DR

GitHub issue resolving has largely relied on textual signals, neglecting visual data that can convey crucial context. CodeV introduces a two-phase multimodal framework that first converts issue visuals into fine-grained descriptions and a structured summary, then uses this enriched representation to generate patches via LLMs, achieving strong gains over text-only baselines. The authors also provide Visual SWE-bench, a 133-instance benchmark across 11 repositories to evaluate visual issue resolving, and demonstrate substantial improvements (e.g., substantial relative gains over Agentless) with robust performance across varying Vision-Language Model sizes. This work highlights the practical value of visual data in software repair tasks and offers a standardized benchmark to propel future multimodal approaches in code-related AI systems.

Abstract

Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data. However, this visual data is crucial for resolving issues as it conveys additional knowledge that text alone cannot. We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of LLMs. CodeV resolves each issue by following a two-phase process: data processing and patch generation. To evaluate CodeV, we construct a benchmark for visual issue resolving, namely Visual SWE-bench. Through extensive experiments, we demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.

Paper Structure

This paper contains 31 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: An example of a visual GitHub issue from https://github.com/plotly/plotly.py/issues/1944. The visual data illustrates that the label parameters (“time” and “day”) do not take effect.
  • Figure 2: Overview of CodeV.
  • Figure 3: Distribution of Visual SWE-bench task instances across 11 open-source GitHub repositories.
  • Figure 4: Venn diagrams of issues resolved from Visual SWE-bench.
  • Figure 5: Fine-grained description example for the instance https://github.com/mwaskom/seaborn/issues/3275, offering detailed insights into the visual data.
  • ...and 6 more figures