Moving beyond Deletions: Program Simplification via Diverse Program Transformations
Haibo Wang, Zezhong Xing, Zheng Wang, Chengnian Sun, Shin Hwei Tan
TL;DR
This paper investigates developer-induced program simplification in open-source software to understand the diversity of transformations developers perform and the motivations behind them. It shows that 26 transformation types span seven categories, with deletion-only approaches covering only about 16 percent of cases, revealing substantial gaps for automation. Based on these insights, the authors introduce SimpT5, a deep-learning framework trained on a large SimpliBench dataset to generate test-equivalent, semantically preserved, simplified programs, aided by simplified line localization and quality checkers. Evaluations demonstrate that SimpT5 outperforms prior deletion- and refactoring-based baselines, offering greater transformation diversity and higher rates of compilable, test-equivalent outputs. The work provides a foundation for richer automated program simplification and outlines implications for IDE usability, dataset curation, and future research in code transformation.
Abstract
To reduce the complexity of software, Developers manually simplify program (known as developer-induced program simplification in this paper) to reduce its code size yet preserving its functionality but manual simplification is time-consuming and error-prone. To reduce manual effort, rule-based approaches (e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can be potentially applied to automate developer-induced program simplification. However, as there is little study on how developers simplify programs in Open-source Software (OSS) projects, it is unclear whether these approaches can be effectively used for developer-induced program simplification. Hence, we present the first study of developer-induced program simplification in OSS projects, focusing on the types of program transformations used, the motivations behind simplifications, and the set of program transformations covered by existing refactoring types. Our study of 382 pull requests from 296 projects reveals that there exist gaps in applying existing approaches for automating developer-induced program simplification. and outlines the criteria for designing automatic program simplification techniques. Inspired by our study and to reduce the manual effort in developer-induced program simplification, we propose SimpT5, a tool that can automatically produce simplified programs (semantically-equivalent programs with reduced source lines of code). SimpT5 is trained based on our collected dataset of 92,485 simplified programs with two heuristics: (1) simplified line localization that encodes lines changed in simplified programs, and (2)checkers that measure the quality of generated programs. Our evaluation shows that SimpT5 are more effective than prior approaches in automating developer-induced program simplification.
