Local Software Buildability across Java Versions (Registered Report)
Matúš Sulír, Jaroslav Porubän, Sergej Chodarev
TL;DR
The work addresses the problem of local build failures in open-source Java projects across multiple Java versions. It proposes a large-scale replication with 2,500 GitHub projects built in containers across JDK versions 6 through 23, using Maven, Gradle, or Ant, and analyzes outcomes via automated scripts and logs. Nine research questions focus on buildability, cross-version compatibility, tool-specific performance, wrapper usage, and failure causes, with a Mann-Kendall trend test to detect shifts in failure rates over time. The study aims to produce a publicly available replication package to inform build repair research, tooling decisions, and language-design considerations, thereby enhancing practical resilience of Java software across versions.
Abstract
Context: Downloading the source code of open-source Java projects and building them on a local computer using Maven, Gradle, or Ant is a common activity performed by researchers and practitioners. Multiple studies so far found that about 40-60% of such attempts fail. Our experience from the last years suggests that the proportion of failed builds rises continually even further. Objective: First, we would like to empirically confirm our hypothesis that with increasing Java versions, the percentage of build-failing projects tends to grow. Next, nine supplementary research questions are proposed, related mainly to the proportions of failing projects, universal version compatibility, failures under specific JDK versions, success rates of build tools, wrappers, and failure reasons. Method: We will sample 2,500 random pure-Java projects having a build configuration file and fulfilling basic quality criteria from GitHub. We will try to automatically build every project in containers with Java versions 6 to 23 installed. Success or failure will be determined by exit codes, and standard output and error streams will be saved. A majority of the analysis will be performed automatically using reproducible scripts.
