Understanding Practitioners' Expectations on Clear Code Review Comments
Junkai Chen, Zhenhao Li, Qiheng Mao, Xing Hu, Kui Liu, Xin Xia
TL;DR
The paper addresses the problem of unclear code review comments (CRCs) by defining three core clarity attributes: Relevance, Informativeness, and Expression (RIE). It combines a systematic literature review, a practitioner survey, and manual labeling of CRCs in open-source projects, and introduces ClearCRC to automatically evaluate CRC clarity using three backbone model families. The study finds that 28.8% of CRCs lack clarity in at least one attribute, with Informativeness often deficient, and shows pretrained language models outperforming other backends in ClearCRC, with reasonable generalization to new datasets. The findings yield actionable guidelines for writing clear CRCs, emphasize data quality for automated CRC generation, and provide replication data to support further research in CRC quality assessment.
Abstract
The code review comment (CRC) is pivotal in the process of modern code review. It provides reviewers with the opportunity to identify potential bugs, offer constructive feedback, and suggest improvements. Clear and concise code review comments (CRCs) facilitate the communication between developers and are crucial to the correct understanding of the identified issues and proposed solutions. Despite the importance of CRCs' clarity, there is still a lack of guidelines on what constitutes a good clarity and how to evaluate it. In this paper, we conduct a comprehensive study on understanding and evaluating the clarity of CRCs. We first derive a set of attributes related to the clarity of CRCs, namely RIE attributes (i.e., Relevance, Informativeness, and Expression), as well as their corresponding evaluation criteria based on our literature review and survey with practitioners. We then investigate the clarity of CRCs in open-source projects written in nine programming languages and find that a large portion (i.e., 28.8%) of the CRCs lack the clarity in at least one of the attributes. Finally, we explore the potential of automatically evaluating the clarity of CRCs by proposing ClearCRC. Experimental results show that ClearCRC with pre-trained language models is promising for effective evaluation of the clarity of CRCs, achieving a balanced accuracy up to 73.04% and a F-1 score up to 94.61%.
