An Empirical Study of Java Code Improvements Based on Stack Overflow Answer Edits
In-on Wiratsin, Chaiyong Ragkhitwetsagul, Matheus Paixao, Denis De Sousa, Pongpop Lapvikai, Peter Haddawy
TL;DR
This paper conducts an empirical study of Java answer edits on Stack Overflow and their applicability to open-source projects. Leveraging SOTorrent, GitHub data, and a revision-aware code clone tool (Siamese+), the authors identify and validate code updates from SO revisions that can improve Java code in OSS. They analyze 140,840 edited SO Java answers and 10,673 GitHub Java projects, finding that 6.91% of SO answers were revised, with 49.30% of the latest SO code applicable to OSS and 391 useful updates (across 12 subtypes) validated as potentially beneficial, with 11 pulled into 4 merged PRs. The work demonstrates the practical utility of crowd-sourced answer edits for maintenance and automation in software engineering, and lays groundwork for revision-aware code-update recommendations in the GenAI era.
Abstract
Suboptimal code is prevalent in software systems. Developers often write low-quality code due to factors like technical knowledge gaps, insufficient experience, time pressure, management decisions, or personal factors. Once integrated, the accumulation of this suboptimal code leads to significant maintenance costs and technical debt. Developers frequently consult external knowledge bases, such as API documentation and Q&A websites like Stack Overflow (SO), to aid their programming tasks. SO's crowdsourced, collaborative nature has created a vast repository of programming knowledge. Its community-curated content is constantly evolving, with new answers posted or existing ones edited. In this paper, we present an empirical study of SO Java answer edits and their application to improving code in open-source projects. We use a modified code clone search tool to analyze SO code snippets with version history and apply it to open-source Java projects. This identifies outdated or unoptimized code and suggests improved alternatives. Analyzing 140,840 Java accepted answers from SOTorrent and 10,668 GitHub Java projects, we manually categorized SO answer edits and created pull requests to open-source projects with the suggested code improvements. Our results show that 6.91% of SO Java accepted answers have more than one revision (average of 2.82). Moreover, 49.24% of the code snippets in the answer edits are applicable to open-source projects, and 11 out of 36 proposed bug fixes based on these edits were accepted by the GitHub project maintainers.
