Table of Contents
Fetching ...

Resource-Efficient & Effective Code Summarization

Saima Afrin, Joseph Call, Khai-Nguyen Nguyen, Oscar Chaparro, Antonio Mastropaolo

TL;DR

Code summarization with large code language models faces sustainability challenges due to high training costs. This work evaluates QLoRA, a 4-bit quantization and LoRA-based fine-tuning method, on CodeLlama and DeepSeek-Coder for Python and Java using CodeXGLUE Code-to-Text data, comparing against full fine-tuning and analyzing memory and accuracy across model sizes. Results show QLoRA achieves competitive or superior performance with substantially reduced memory usage, with CodeLlama 34B often providing the best gains and memory profiles still remaining favorable. The findings support QLoRA as a practical, resource-efficient approach for Code-to-NL tasks and suggest partial generalization to general-purpose LLMs, guiding future sustainable deployment of bi-modal software engineering models.

Abstract

Code Language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions of parameters (e.g., GPT-4). However, as models grow in scale, sustainability concerns emerge, as they are extremely resource-intensive, highlighting the need for efficient, environmentally conscious solutions. GreenAI techniques, such as QLoRA (Quantized Low-Rank Adaptation), offer a promising path for dealing with large models' sustainability as they enable resource-efficient model fine-tuning. Previous research has shown the effectiveness of QLoRA in code-related tasks, particularly those involving natural language inputs and code as the target output (NL-to-Code), such as code generation. However, no studies have explored its application to tasks that are fundamentally similar to NL-to-Code (natural language to code) but operate in the opposite direction, such as code summarization. This leaves a gap in understanding how well QLoRA can generalize to Code-to-NL tasks, which are equally important for supporting developers in understanding and maintaining code. To address this gap, we investigate the extent to which QLoRA's capabilities in NL-to-Code tasks can be leveraged and transferred to code summarization, one representative Code-to-NL task. Our study evaluates two state-of-the-art CLMs (CodeLlama and DeepSeek-Coder) across two programming languages: Python and Java. Our research tasked models with generating descriptions for Python and Java code methods. The results align with prior findings on QLoRA for source code generation, showing that QLoRA enables efficient fine-tuning of CLMs for code summarization.

Resource-Efficient & Effective Code Summarization

TL;DR

Code summarization with large code language models faces sustainability challenges due to high training costs. This work evaluates QLoRA, a 4-bit quantization and LoRA-based fine-tuning method, on CodeLlama and DeepSeek-Coder for Python and Java using CodeXGLUE Code-to-Text data, comparing against full fine-tuning and analyzing memory and accuracy across model sizes. Results show QLoRA achieves competitive or superior performance with substantially reduced memory usage, with CodeLlama 34B often providing the best gains and memory profiles still remaining favorable. The findings support QLoRA as a practical, resource-efficient approach for Code-to-NL tasks and suggest partial generalization to general-purpose LLMs, guiding future sustainable deployment of bi-modal software engineering models.

Abstract

Code Language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions of parameters (e.g., GPT-4). However, as models grow in scale, sustainability concerns emerge, as they are extremely resource-intensive, highlighting the need for efficient, environmentally conscious solutions. GreenAI techniques, such as QLoRA (Quantized Low-Rank Adaptation), offer a promising path for dealing with large models' sustainability as they enable resource-efficient model fine-tuning. Previous research has shown the effectiveness of QLoRA in code-related tasks, particularly those involving natural language inputs and code as the target output (NL-to-Code), such as code generation. However, no studies have explored its application to tasks that are fundamentally similar to NL-to-Code (natural language to code) but operate in the opposite direction, such as code summarization. This leaves a gap in understanding how well QLoRA can generalize to Code-to-NL tasks, which are equally important for supporting developers in understanding and maintaining code. To address this gap, we investigate the extent to which QLoRA's capabilities in NL-to-Code tasks can be leveraged and transferred to code summarization, one representative Code-to-NL task. Our study evaluates two state-of-the-art CLMs (CodeLlama and DeepSeek-Coder) across two programming languages: Python and Java. Our research tasked models with generating descriptions for Python and Java code methods. The results align with prior findings on QLoRA for source code generation, showing that QLoRA enables efficient fine-tuning of CLMs for code summarization.

Paper Structure

This paper contains 21 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: QLoRA finetuning with paged optimizers dettmers2024qlora
  • Figure 2: Semantically equivalent Java code summaries.
  • Figure 3: Examples of predictions made by CodeLlama 34B that have been labeled as meaningful code summaries.