MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

Xinglu Pan; Chenxiao Liu; Yanzhen Zou; Tao Xie; Bing Xie

MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

Xinglu Pan, Chenxiao Liu, Yanzhen Zou, Tao Xie, Bing Xie

TL;DR

The paper introduces MESIA, a Mean Supplementary Information Amount metric to quantify how much a method-level code comment adds beyond the method signature. It defines MESIA using Shannon self-information, validates it on the TL-CodeSum dataset, and shows strong alignment with human judgments. Through experiments with seq2seq, Transformer, and CodeT5 models, it demonstrates that large-MESIA comments are hard to generate and that training data skewed away from small-MESIA comments improves large-MESIA generation, albeit sometimes at the expense of conventional BLEU scores. The work highlights both the importance and challenges of producing informative, supplementary comments and suggests future directions for evaluation and data construction to better support developers in a hybrid method comprehension workflow.

Abstract

Code comments are important for developers in program comprehension. In scenarios of comprehending and reusing a method, developers expect code comments to provide supplementary information beyond the method signature. However, the extent of such supplementary information varies a lot in different code comments. In this paper, we raise the awareness of the supplementary nature of method-level comments and propose a new metric named MESIA (Mean Supplementary Information Amount) to assess the extent of supplementary information that a code comment can provide. With the MESIA metric, we conduct experiments on a popular code-comment dataset and three common types of neural approaches to generate method-level comments. Our experimental results demonstrate the value of our proposed work with a number of findings. (1) Small-MESIA comments occupy around 20% of the dataset and mostly fall into only the WHAT comment category. (2) Being able to provide various kinds of essential information, large-MESIA comments in the dataset are difficult for existing neural approaches to generate. (3) We can improve the capability of existing neural approaches to generate large-MESIA comments by reducing the proportion of small-MESIA comments in the training set. (4) The retrained model can generate large-MESIA comments that convey essential meaningful supplementary information for methods in the small-MESIA test set, but will get a lower BLEU score in evaluation. These findings indicate that with good training data, auto-generated comments can sometimes even surpass human-written reference comments, and having no appropriate ground truth for evaluation is an issue that needs to be addressed by future work on automatic comment generation.

MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

TL;DR

Abstract

Paper Structure (23 sections, 7 equations, 12 figures, 1 table)

This paper contains 23 sections, 7 equations, 12 figures, 1 table.

Introduction
Motivation And Challenges
Preliminary and Definition
Study Design
Research Questions
Study Setup
Dataset
Neural Approaches
Experimental Settings
Study Result
RQ1. How well does MESIA reflect the relative extent of supplementary information in method-level comments?
RQ2. What is the capability of existing neural approaches to generate code comments with different MESIA values?
RQ3. How well can MESIA be used to improve existing neural approaches to generate large-MESIA comments?
Discussion
Discussion of the MESIA Metric
...and 8 more sections

Figures (12)

Figure 1: Different code comments that provide different extents of supplementary information.
Figure 2: An analysis of the number of words that remain in a method comment after removing stop words and words in the method's split signature.
Figure 3: An analysis of the proportion of words that remain in a method comment after removing stop words and words in the method's split signature.
Figure 4: Distribution of the MESIA value of the code comments in the TL-CodeSum dataset.
Figure 5: Correlation between the MESIA values and the manual scores of each rater for the experimental code comments. MESIA and Score1 have a Spearman's $\rho$=0.9532 (p-value of 9.84e-53). MESIA and Score2 have a Spearman's $\rho$=0.9530 (p-value of 1.26e-42). MESIA and Score3 have a Spearman's $\rho$=0.9498 (p-value of 2.69e-51). We also calculate the Spearman's $\rho$ between the manual scores of different raters. Score1 and Score2 have a Spearman's $\rho$=0.9478 (p-value of 1.78e-50). Score1 and Score3 have a Spearman's $\rho$=0.9367 (p-value of 1.77e-46). Score2 and Score3 have a Spearman's $\rho$=0.9780 (p-value of 1.56e-68). All the results demonstrate a strong and significant correlation.
...and 7 more figures

MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

TL;DR

Abstract

MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)