What Makes a Good TODO Comment?
Haoye Wang, Zhipeng Gao, Tingting Bi, John Grundy, Xinyu Wang, Minghui Wu, Xiaohu Yang
TL;DR
The paper examines TODO comment quality in open-source software by analyzing 2,863 TODOs from top Java GitHub repositories, establishing criteria for high-quality Task and Notice TODOs, and demonstrating that nearly half are low-quality with longer lifecycles. It combines manual labeling, thematic analysis, lifecycle tracking, and a two-stage CodeBERT-based classifier to automatically identify TODO form and quality, achieving strong performance. The findings reveal substantial project-level variability in TODO quality, meaningful differences in lifecycle and resolution for high- vs low-quality TODOs, and practical tooling implications. Overall, the work provides a structured framework for defining, measuring, and improving TODO quality to reduce technical debt and improve maintenance efficiency.
Abstract
Software development is a collaborative process that involves various interactions among individuals and teams. TODO comments in source code play a critical role in managing and coordinating diverse tasks during this process. However, this study finds that a large proportion of open-source project TODO comments are left unresolved or take a long time to be resolved. About 46.7\% of TODO comments in open-source repositories are of low-quality (e.g., TODOs that are ambiguous, lack information, or are useless to developers). This highlights the need for better TODO practices. In this study, we investigate four aspects regarding the quality of TODO comments in open-source projects: (1) the prevalence of low-quality TODO comments; (2) the key characteristics of high-quality TODO comments; (3) how are TODO comments of different quality managed in practice; and (4) the feasibility of automatically assessing TODO comment quality. Examining 2,863 TODO comments from Top100 GitHub Java repositories, we propose criteria to identify high-quality TODO comments and provide insights into their optimal composition. We discuss the lifecycle of TODO comments with varying quality. we construct deep learning-based methods that show promising performance in identifying the quality of TODO comments, potentially enhancing development efficiency and code quality.
