Table of Contents
Fetching ...

CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network

Yanlin Wang, Ensheng Shi, Lun Du, Xiaodi Yang, Yuxuan Hu, Shi Han, Hongyu Zhang, Dongmei Zhang

TL;DR

CoCoSUM addresses the limitation of local-context-only code summarization by introducing two global contexts: intra-class class-name semantics and inter-class UML relationships modeled via a Multi-Relational Graph Neural Network. The architecture integrates these signals with traditional local encoders through a two-level attention-based decoder, achieving state-of-the-art results on CodeSearchNet and CoCoNet. Key contributions include the first explicit modeling of both global contexts in code summarization and the design of an MR-GNN tailored for UML graphs, along with extensive ablations, generality tests, and human evaluation. The approach demonstrates that incorporating broader project-level context significantly enhances the quality and reliability of generated code summaries, with practical implications for software comprehension and maintenance.

Abstract

Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing summaries manually. However, most contemporary approaches mainly leverage the information within the boundary of the method being summarized (i.e., local context), and ignore the broader context that could assist with code summarization. This paper explores two global contexts, namely intra-class and inter-class contexts, and proposes the model CoCoSUM: Contextual Code Summarization with Multi-Relational Graph Neural Networks. CoCoSUM first incorporates class names as the intra-class context to generate the class semantic embeddings. Then, relevant Unified Modeling Language (UML) class diagrams are extracted as inter-class context and are encoded into the class relational embeddings using a novel Multi-Relational Graph Neural Network (MRGNN). Class semantic embeddings and class relational embeddings, together with the outputs from code token encoder and AST encoder, are passed to a decoder armed with a two-level attention mechanism to generate high-quality, context-aware code summaries. We conduct extensive experiments to evaluate our approach and compare it with other automatic code summarization models. The experimental results show that CoCoSUM is effective and outperforms state-of-the-art methods. Our source code and experimental data are available in the supplementary materials and will be made publicly available.

CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network

TL;DR

CoCoSUM addresses the limitation of local-context-only code summarization by introducing two global contexts: intra-class class-name semantics and inter-class UML relationships modeled via a Multi-Relational Graph Neural Network. The architecture integrates these signals with traditional local encoders through a two-level attention-based decoder, achieving state-of-the-art results on CodeSearchNet and CoCoNet. Key contributions include the first explicit modeling of both global contexts in code summarization and the design of an MR-GNN tailored for UML graphs, along with extensive ablations, generality tests, and human evaluation. The approach demonstrates that incorporating broader project-level context significantly enhances the quality and reliability of generated code summaries, with practical implications for software comprehension and maintenance.

Abstract

Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing summaries manually. However, most contemporary approaches mainly leverage the information within the boundary of the method being summarized (i.e., local context), and ignore the broader context that could assist with code summarization. This paper explores two global contexts, namely intra-class and inter-class contexts, and proposes the model CoCoSUM: Contextual Code Summarization with Multi-Relational Graph Neural Networks. CoCoSUM first incorporates class names as the intra-class context to generate the class semantic embeddings. Then, relevant Unified Modeling Language (UML) class diagrams are extracted as inter-class context and are encoded into the class relational embeddings using a novel Multi-Relational Graph Neural Network (MRGNN). Class semantic embeddings and class relational embeddings, together with the outputs from code token encoder and AST encoder, are passed to a decoder armed with a two-level attention mechanism to generate high-quality, context-aware code summaries. We conduct extensive experiments to evaluate our approach and compare it with other automatic code summarization models. The experimental results show that CoCoSUM is effective and outperforms state-of-the-art methods. Our source code and experimental data are available in the supplementary materials and will be made publicly available.

Paper Structure

This paper contains 29 sections, 18 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: UML class diagram example. Left is the UML class diagram and right is the corresponding code.
  • Figure 2: The overview of CoCoSUM
  • Figure 3: Global context encoder based on our proposed MRGNN
  • Figure 4: Length distribution of code and summary
  • Figure 5: Comparison on different code lengths and summary lengths
  • ...and 1 more figures