Table of Contents
Fetching ...

LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models

Zhijie Liu, Yutian Tang, Meiyun Li, Xin Jin, Yunfei Long, Liang Feng Zhang, Xiapu Luo

TL;DR

This work targets configuration compatibility bugs in Android XML configurations, which cause rendering differences or crashes across API levels. It first assesses how well pretrained LLMs (GPT-3.5, GPT-4, Bard) detect and repair such bugs, revealing limited detection reliability but some repair potential for hard-to-fix cases. Building on these insights, the authors propose LLM-CompDroid, a hybrid framework that couples ConfFix-style dynamic analysis with LLM-driven repair and evaluation to produce robust patches. Empirical results show that LLM-CompDroid variants based on GPT-3.5 and GPT-4 outperform baselines including ConfFix and Lint in repair effectiveness, with high Correct and Correct@k scores and good stability, demonstrating the value of integrating traditional tools with LLMs for Android configuration robustness.

Abstract

XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration compatibility bugs. Our findings highlight certain limitations of LLMs in effectively identifying and resolving these bugs, while also revealing their potential in addressing complex, hard-to-repair issues that traditional tools struggle with. Leveraging these insights, we introduce the LLM-CompDroid framework, which combines the strengths of LLMs and traditional tools for bug resolution. Our experimental results demonstrate a significant enhancement in bug resolution performance by LLM-CompDroid, with LLM-CompDroid-GPT-3.5 and LLM-CompDroid-GPT-4 surpassing the state-of-the-art tool, ConfFix, by at least 9.8% and 10.4% in both Correct and Correct@k metrics, respectively. This innovative approach holds promise for advancing the reliability and robustness of Android applications, making a valuable contribution to the field of software development.

LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models

TL;DR

This work targets configuration compatibility bugs in Android XML configurations, which cause rendering differences or crashes across API levels. It first assesses how well pretrained LLMs (GPT-3.5, GPT-4, Bard) detect and repair such bugs, revealing limited detection reliability but some repair potential for hard-to-fix cases. Building on these insights, the authors propose LLM-CompDroid, a hybrid framework that couples ConfFix-style dynamic analysis with LLM-driven repair and evaluation to produce robust patches. Empirical results show that LLM-CompDroid variants based on GPT-3.5 and GPT-4 outperform baselines including ConfFix and Lint in repair effectiveness, with high Correct and Correct@k scores and good stability, demonstrating the value of integrating traditional tools with LLMs for Android configuration robustness.

Abstract

XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration compatibility bugs. Our findings highlight certain limitations of LLMs in effectively identifying and resolving these bugs, while also revealing their potential in addressing complex, hard-to-repair issues that traditional tools struggle with. Leveraging these insights, we introduce the LLM-CompDroid framework, which combines the strengths of LLMs and traditional tools for bug resolution. Our experimental results demonstrate a significant enhancement in bug resolution performance by LLM-CompDroid, with LLM-CompDroid-GPT-3.5 and LLM-CompDroid-GPT-4 surpassing the state-of-the-art tool, ConfFix, by at least 9.8% and 10.4% in both Correct and Correct@k metrics, respectively. This innovative approach holds promise for advancing the reliability and robustness of Android applications, making a valuable contribution to the field of software development.
Paper Structure (33 sections, 1 equation, 5 figures, 7 tables)

This paper contains 33 sections, 1 equation, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Example of compatibility bug across Android API levels 22, 23, and 31.
  • Figure 2: Repared result by GPT-4 for Music-Player-GO.
  • Figure 3: Results in heatmaps of identifying conflicting Android API levels by GPT-3.5, GPT-4, and Bard.
  • Figure 4: Overview of LLM-CompDroid framework.
  • Figure 5: Example with low impact level across Android API levels 22, 23, and 31.