Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
Dingbang Wang, Yu Zhao, Sidong Feng, Zhaoxu Zhang, William G. J. Halfond, Chunyang Chen, Xiaoxia Sun, Jiangfan Shi, Tingting Yu
TL;DR
This paper presents ReBL, a feedback-driven, GPT-4–based system that automates bug report reproduction for Android apps by analyzing entire bug reports rather than relying on Step-to-Reproduce (S2R) entities. By integrating rich UI context, carefully designed prompts, multi-action support, and automated feedback, ReBL achieves a high success rate (90.63% across 96 reports) with an average reproduction time of 74.98 seconds, outperforming three state-of-the-art baselines. The approach handles both crash and non-crash bug reports and includes an ablation study to quantify the contributions of S2R elimination, UI grouping, and feedback mechanisms. Empirical results demonstrate substantial gains in both effectiveness and efficiency, suggesting practical utility for developers and bug-tracking workflows. Future work includes extending support for more non-crash symptoms and exploring static analysis to enhance LLM reasoning under limited information scenarios.
Abstract
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model (LLM), to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative prompts to enhance GPT's contextual reasoning. This approach is more flexible and context-aware than the traditional step-by-step entity matching approach, resulting in improved accuracy and effectiveness. In addition to handling crash reports, ReBL has the capability of handling non-crash functional bug reports. Our evaluation of 96 Android bug reports (73 crash and 23 non-crash) demonstrates that ReBL successfully reproduced 90.63% of these reports, averaging only 74.98 seconds per bug report. Additionally, ReBL outperformed three existing tools in both success rate and speed.
