Learning to Update Natural Language Comments Based on Code Changes
Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, Raymond J. Mooney
TL;DR
This work tackles automatic updating of natural language code comments when the underlying code changes. It introduces a cross-modal edit model that represents code changes and comment edits as sequences, encoded by a dual-encoder architecture and decoded into a comment-edit sequence with a reranking mechanism. Experiments on a GitHub Java dataset show the edit-based approach outperforms baselines on edit-centric metrics and is favorably evaluated by humans, though some cases still require more context or even generation from scratch. The study provides a practical framework for maintaining synchronized documentation and releases accompanying open-source software.
Abstract
We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a dataset that we collected from commit histories of open-source software projects, with each example consisting of a concurrent update to a method and its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms baselines with respect to making edits.
