Table of Contents
Fetching ...

Context-aware Code Summary Generation

Chia-Yi Su, Aakash Bansal, Yu Huang, Toby Jia-Jun Li, Collin McMillan

TL;DR

The paper addresses the challenge of code summarization by focusing on why a piece of code exists within a broader program, not just what it does. It introduces a context-aware approach that leverages call-context by first summarizing caller methods and then conditioning the target method's summary on those results, implemented with both commercial LLMs and a locally runnable 350M-parameter open-source model. Through a two-pronged evaluation—commercial-model experiments and an open-source distillation/fine-tuning pipeline on curated human exemplars—the authors show that context-aware summaries improve and that a small, locally deployable model can outperform large commercial models on this task. The work demonstrates practical impact in privacy-preserving, cost-effective code documentation and provides a reproducible framework with release-ready artifacts for future research and tooling.

Abstract

Code summary generation is the task of writing natural language descriptions of a section of source code. Recent advances in Large Language Models (LLMs) and other AI-based technologies have helped make automatic code summarization a reality. However, the summaries these approaches write tend to focus on a narrow area of code. The results are summaries that explain what that function does internally, but lack a description of why the function exists or its purpose in the broader context of the program. In this paper, we present an approach for including this context in recent LLM-based code summarization. The input to our approach is a Java method and that project in which that method exists. The output is a succinct English description of why the method exists in the project. The core of our approach is a 350m parameter language model we train, which can be run locally to ensure privacy. We train the model in two steps. First we distill knowledge about code summarization from a large model, then we fine-tune the model using data from a study of human programmer who were asked to write code summaries. We find that our approach outperforms GPT-4 on this task.

Context-aware Code Summary Generation

TL;DR

The paper addresses the challenge of code summarization by focusing on why a piece of code exists within a broader program, not just what it does. It introduces a context-aware approach that leverages call-context by first summarizing caller methods and then conditioning the target method's summary on those results, implemented with both commercial LLMs and a locally runnable 350M-parameter open-source model. Through a two-pronged evaluation—commercial-model experiments and an open-source distillation/fine-tuning pipeline on curated human exemplars—the authors show that context-aware summaries improve and that a small, locally deployable model can outperform large commercial models on this task. The work demonstrates practical impact in privacy-preserving, cost-effective code documentation and provides a reproducible framework with release-ready artifacts for future research and tooling.

Abstract

Code summary generation is the task of writing natural language descriptions of a section of source code. Recent advances in Large Language Models (LLMs) and other AI-based technologies have helped make automatic code summarization a reality. However, the summaries these approaches write tend to focus on a narrow area of code. The results are summaries that explain what that function does internally, but lack a description of why the function exists or its purpose in the broader context of the program. In this paper, we present an approach for including this context in recent LLM-based code summarization. The input to our approach is a Java method and that project in which that method exists. The output is a succinct English description of why the method exists in the project. The core of our approach is a 350m parameter language model we train, which can be run locally to ensure privacy. We train the model in two steps. First we distill knowledge about code summarization from a large model, then we fine-tune the model using data from a study of human programmer who were asked to write code summaries. We find that our approach outperforms GPT-4 on this task.
Paper Structure (31 sections, 5 figures, 4 tables)

This paper contains 31 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of code summarization processes. Area 1 is the typical process, with a neural model generating a summary from just the target method itself. Area 2 is an extended baseline in which a model with a very large input window sees the target method and the rest of the project. Area 3a-c show our process, which we describe in Section \ref{['sec:approach']}.
  • Figure 2: Overview of Weight Freezing. The solid-line and dot-line means the weights are trainable and frozen respectively.
  • Figure 3: The interface for our study. Each participant saw two pages for each Java method. The top image shows the first page and the bottom image shows the second page.
  • Figure 4: Bar charts showing the responses to Question 2 for each experiment. For example, in Experiment 3, there were approximately 275 samples for which participants preferred gemini-context versus about 125 for gpt4-base.
  • Figure 5: Tournament bracket showing experiment outcomes for each research question. The "winners" in these brackets are determined based on which model received more votes for Question 2 in the survey (see Section \ref{['sec:eval_method']}).