LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models
Zhijie Liu, Yutian Tang, Meiyun Li, Xin Jin, Yunfei Long, Liang Feng Zhang, Xiapu Luo
TL;DR
This work targets configuration compatibility bugs in Android XML configurations, which cause rendering differences or crashes across API levels. It first assesses how well pretrained LLMs (GPT-3.5, GPT-4, Bard) detect and repair such bugs, revealing limited detection reliability but some repair potential for hard-to-fix cases. Building on these insights, the authors propose LLM-CompDroid, a hybrid framework that couples ConfFix-style dynamic analysis with LLM-driven repair and evaluation to produce robust patches. Empirical results show that LLM-CompDroid variants based on GPT-3.5 and GPT-4 outperform baselines including ConfFix and Lint in repair effectiveness, with high Correct and Correct@k scores and good stability, demonstrating the value of integrating traditional tools with LLMs for Android configuration robustness.
Abstract
XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration compatibility bugs. Our findings highlight certain limitations of LLMs in effectively identifying and resolving these bugs, while also revealing their potential in addressing complex, hard-to-repair issues that traditional tools struggle with. Leveraging these insights, we introduce the LLM-CompDroid framework, which combines the strengths of LLMs and traditional tools for bug resolution. Our experimental results demonstrate a significant enhancement in bug resolution performance by LLM-CompDroid, with LLM-CompDroid-GPT-3.5 and LLM-CompDroid-GPT-4 surpassing the state-of-the-art tool, ConfFix, by at least 9.8% and 10.4% in both Correct and Correct@k metrics, respectively. This innovative approach holds promise for advancing the reliability and robustness of Android applications, making a valuable contribution to the field of software development.
