Table of Contents
Fetching ...

Dopamin: Transformer-based Comment Classifiers through Domain Post-Training and Multi-level Layer Aggregation

Nam Le Hai, Nghi D. Q. Bui

TL;DR

Dopamin tackles automatic classification of code comments across multiple programming languages using a Transformer-based framework. It combines domain post-training on diverse comment data with hierarchical HSUM layer aggregation on a CodeBERT backbone and an optimal checkpoint strategy to maximize per-category F1 performance without increasing inference time. The approach yields improvements over the STACC baseline on the NLBSE'24 dataset, with notable gains in languages like Java and Python and selective improvements in Pharo due to data scarcity. The work demonstrates effective cross-language knowledge transfer and provides practical, efficient tooling for comment quality assessment in multilingual codebases.

Abstract

Code comments provide important information for understanding the source code. They can help developers understand the overall purpose of a function or class, as well as identify bugs and technical debt. However, an overabundance of comments is meaningless and counterproductive. As a result, it is critical to automatically filter out these comments for specific purposes. In this paper, we present Dopamin, a Transformer-based tool for dealing with this issue. Our model excels not only in presenting knowledge sharing of common categories across multiple languages, but also in achieving robust performance in comment classification by improving comment representation. As a result, it outperforms the STACC baseline by 3% on the NLBSE'24 Tool Competition dataset in terms of average F1-score, while maintaining a comparable inference time for practical use. The source code is publicity available at https://github.com/FSoft-AI4Code/Dopamin.

Dopamin: Transformer-based Comment Classifiers through Domain Post-Training and Multi-level Layer Aggregation

TL;DR

Dopamin tackles automatic classification of code comments across multiple programming languages using a Transformer-based framework. It combines domain post-training on diverse comment data with hierarchical HSUM layer aggregation on a CodeBERT backbone and an optimal checkpoint strategy to maximize per-category F1 performance without increasing inference time. The approach yields improvements over the STACC baseline on the NLBSE'24 dataset, with notable gains in languages like Java and Python and selective improvements in Pharo due to data scarcity. The work demonstrates effective cross-language knowledge transfer and provides practical, efficient tooling for comment quality assessment in multilingual codebases.

Abstract

Code comments provide important information for understanding the source code. They can help developers understand the overall purpose of a function or class, as well as identify bugs and technical debt. However, an overabundance of comments is meaningless and counterproductive. As a result, it is critical to automatically filter out these comments for specific purposes. In this paper, we present Dopamin, a Transformer-based tool for dealing with this issue. Our model excels not only in presenting knowledge sharing of common categories across multiple languages, but also in achieving robust performance in comment classification by improving comment representation. As a result, it outperforms the STACC baseline by 3% on the NLBSE'24 Tool Competition dataset in terms of average F1-score, while maintaining a comparable inference time for practical use. The source code is publicity available at https://github.com/FSoft-AI4Code/Dopamin.
Paper Structure (19 sections, 2 equations, 1 figure, 4 tables)