Table of Contents
Fetching ...

AUTOGENICS: Automated Generation of Context-Aware Inline Comments for Code Snippets on Programming Q&A Sites Using LLM

Suborno Deb Bappon, Saikat Mondal, Banani Roy

TL;DR

Code snippets on Stack Overflow frequently lack inline comments, hindering readability and reuse. The authors use LLMs, notably Gemini 1.5 Pro and GPT-4 for comparisons, to generate inline comments for 400 accepted Python/Java snippets and evaluate them via accuracy, adequacy, conciseness, and usefulness, supplemented by a practitioner survey. They then build AUTOGENICS, a browser plugin that extracts question context and filters noise to produce context-aware, noise-free inline comments, which outperform baseline LLMs on key metrics. The results suggest that context-aware automated commenting can significantly improve code understanding on Q&A sites and has potential for broad adoption and educator use.

Abstract

Inline comments in the source code facilitate easy comprehension, reusability, and enhanced readability. However, code snippets in answers on Q&A sites like Stack Overflow (SO) often lack comments because answerers volunteer their time and often skip comments or explanations due to time constraints. Existing studies show that these online code examples are difficult to read and understand, making it difficult for developers (especially novices) to use them correctly and leading to misuse. Given these challenges, we introduced AUTOGENICS, a tool designed to integrate with SO to generate effective inline comments for code snippets in SO answers exploiting large language models (LLMs). Our contributions are threefold. First, we randomly select 400 answer code snippets from SO and generate inline comments for them using LLMs. We then manually evaluate these comments' effectiveness using four key metrics: accuracy, adequacy, conciseness, and usefulness. Overall, LLMs demonstrate promising effectiveness in generating inline comments for SO answer code snippets. Second, we surveyed 14 active SO users to perceive the effectiveness of these inline comments. The survey results are consistent with our previous manual evaluation. However, according to our evaluation, LLMs-generated comments are less effective for shorter code snippets and sometimes produce noisy comments. Third, to address the gaps, we introduced AUTOGENICS, which extracts additional context from question texts and generates context-aware inline comments. It also optimizes comments by removing noise (e.g., comments in import statements and variable declarations). We evaluate the effectiveness of AUTOGENICS-generated comments using the same four metrics that outperform those of standard LLMs. AUTOGENICS might (a) enhance code comprehension, (b) save time, and improve developers' ability to learn and reuse code more accurately.

AUTOGENICS: Automated Generation of Context-Aware Inline Comments for Code Snippets on Programming Q&A Sites Using LLM

TL;DR

Code snippets on Stack Overflow frequently lack inline comments, hindering readability and reuse. The authors use LLMs, notably Gemini 1.5 Pro and GPT-4 for comparisons, to generate inline comments for 400 accepted Python/Java snippets and evaluate them via accuracy, adequacy, conciseness, and usefulness, supplemented by a practitioner survey. They then build AUTOGENICS, a browser plugin that extracts question context and filters noise to produce context-aware, noise-free inline comments, which outperform baseline LLMs on key metrics. The results suggest that context-aware automated commenting can significantly improve code understanding on Q&A sites and has potential for broad adoption and educator use.

Abstract

Inline comments in the source code facilitate easy comprehension, reusability, and enhanced readability. However, code snippets in answers on Q&A sites like Stack Overflow (SO) often lack comments because answerers volunteer their time and often skip comments or explanations due to time constraints. Existing studies show that these online code examples are difficult to read and understand, making it difficult for developers (especially novices) to use them correctly and leading to misuse. Given these challenges, we introduced AUTOGENICS, a tool designed to integrate with SO to generate effective inline comments for code snippets in SO answers exploiting large language models (LLMs). Our contributions are threefold. First, we randomly select 400 answer code snippets from SO and generate inline comments for them using LLMs. We then manually evaluate these comments' effectiveness using four key metrics: accuracy, adequacy, conciseness, and usefulness. Overall, LLMs demonstrate promising effectiveness in generating inline comments for SO answer code snippets. Second, we surveyed 14 active SO users to perceive the effectiveness of these inline comments. The survey results are consistent with our previous manual evaluation. However, according to our evaluation, LLMs-generated comments are less effective for shorter code snippets and sometimes produce noisy comments. Third, to address the gaps, we introduced AUTOGENICS, which extracts additional context from question texts and generates context-aware inline comments. It also optimizes comments by removing noise (e.g., comments in import statements and variable declarations). We evaluate the effectiveness of AUTOGENICS-generated comments using the same four metrics that outperform those of standard LLMs. AUTOGENICS might (a) enhance code comprehension, (b) save time, and improve developers' ability to learn and reuse code more accurately.
Paper Structure (19 sections, 6 figures, 8 tables)

This paper contains 19 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: An overview of the AUTOGENICS workflow.
  • Figure 2: A motivational example spark where users requested a code explanation to validate its accuracy contrasted with the same answer improved by AUTOGENICS-generated comments.
  • Figure 3: Research methodology for human-centric evaluation of inline code comment generation.
  • Figure 4: An overview of AUTOGENICS system architecture.
  • Figure 5: Interest levels and preferences for an automated inline comment-generating tool.
  • ...and 1 more figures