Diagnosing and Resolving Android Applications Building Issues: An Empirical Study
Lakshmi Priya Bodepudi, Yutong Zhao, Ming Quan Fu, Yuanyuan Wu, Sen He, Yu Zhao
TL;DR
This study empirically analyzes Android build failures across 200 GitHub projects in Java and Kotlin to identify four primary root causes: environment issues, dependency/Gradle task errors, configuration problems, and syntax/API incompatibilities. It introduces a six-step diagnostic and repair workflow, achieving a 75.56% fix rate (102 of 135 failing projects) and providing a public dataset of build traces for reproducibility and AI research. The work also evaluates GPT-5’s ability to assist in diagnosis and repair, obtaining a 53.33% success rate on a stratified set of cases and highlighting both the promise and current limitations of AI-assisted Android build maintenance. Project attributes such as language, age, and app size significantly influence build success, with Kotlin and newer, smaller projects tending to build more reliably. Collectively, the paper advances practical methods for diagnosing Android builds, demonstrates the potential of LLM assistance, and offers a valuable dataset for reproducibility and future AI-enabled maintenance.
Abstract
Building Android applications reliably remains a persistent challenge due to complex dependencies, diverse configurations, and the rapid evolution of the Android ecosystem. This study conducts an empirical analysis of 200 open-source Android projects written in Java and Kotlin to diagnose and resolve build failures. Through a five-phase process encompassing data collection, build execution, failure classification, repair strategy design, and LLM-assisted evaluation, we identified four primary types of build errors: environment issues, dependency and Gradle task errors, configuration problems, and syntax/API incompatibilities. Among the 135 projects that initially failed to build, our diagnostic and repair strategy enabled developers to resolve 102 cases (75.56%), significantly reducing troubleshooting effort. We further examined the potential of Large Language Models, such as GPT-5, to assist in error diagnosis, achieving a 53.3% success rate in suggesting viable fixes. An analysis of project attributes revealed that build success is influenced by programming language, project age, and app size. These findings provide practical insights into improving Android build reliability and advancing AI-assisted software maintenance.
