Table of Contents
Fetching ...

Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

Sidong Feng, Chunyang Chen

TL;DR

The paper tackles automated Android bug reproduction from textual bug reports by leveraging large language models through prompt engineering. It presents AdbGPT, a lightweight two-phase approach—S2R entity extraction and GUI-guided replay—that uses few-shot learning and chain-of-thought reasoning without any training. Empirical results show AdbGPT achieves 81.3% bug reproduction and roughly 5x speedups over baselines, with positive feedback from a user study. The work demonstrates that LLMs can understand bug reports and guide dynamic GUIs for practical software maintenance tasks.

Abstract

Bug reports are vital for software maintenance that allow users to inform developers of the problems encountered while using the software. As such, researchers have committed considerable resources toward automating bug replay to expedite the process of software maintenance. Nonetheless, the success of current automated approaches is largely dictated by the characteristics and quality of bug reports, as they are constrained by the limitations of manually-crafted patterns and pre-defined vocabulary lists. Inspired by the success of Large Language Models (LLMs) in natural language understanding, we propose AdbGPT, a new lightweight approach to automatically reproduce the bugs from bug reports through prompt engineering, without any training and hard-coding effort. AdbGPT leverages few-shot learning and chain-of-thought reasoning to elicit human knowledge and logical reasoning from LLMs to accomplish the bug replay in a manner similar to a developer. Our evaluations demonstrate the effectiveness and efficiency of our AdbGPT to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies. We also conduct a small-scale user study to confirm the usefulness of AdbGPT in enhancing developers' bug replay capabilities.

Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

TL;DR

The paper tackles automated Android bug reproduction from textual bug reports by leveraging large language models through prompt engineering. It presents AdbGPT, a lightweight two-phase approach—S2R entity extraction and GUI-guided replay—that uses few-shot learning and chain-of-thought reasoning without any training. Empirical results show AdbGPT achieves 81.3% bug reproduction and roughly 5x speedups over baselines, with positive feedback from a user study. The work demonstrates that LLMs can understand bug reports and guide dynamic GUIs for practical software maintenance tasks.

Abstract

Bug reports are vital for software maintenance that allow users to inform developers of the problems encountered while using the software. As such, researchers have committed considerable resources toward automating bug replay to expedite the process of software maintenance. Nonetheless, the success of current automated approaches is largely dictated by the characteristics and quality of bug reports, as they are constrained by the limitations of manually-crafted patterns and pre-defined vocabulary lists. Inspired by the success of Large Language Models (LLMs) in natural language understanding, we propose AdbGPT, a new lightweight approach to automatically reproduce the bugs from bug reports through prompt engineering, without any training and hard-coding effort. AdbGPT leverages few-shot learning and chain-of-thought reasoning to elicit human knowledge and logical reasoning from LLMs to accomplish the bug replay in a manner similar to a developer. Our evaluations demonstrate the effectiveness and efficiency of our AdbGPT to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies. We also conduct a small-scale user study to confirm the usefulness of AdbGPT in enhancing developers' bug replay capabilities.
Paper Structure (28 sections, 5 figures, 6 tables)

This paper contains 28 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The process of prompt engineering.
  • Figure 2: The overview of AdbGPT.
  • Figure 3: Illustration of GUI encoding.
  • Figure 4: Examples of S2R extraction.
  • Figure 5: Examples of guided replay.