Better Python Programming for all: With the focus on Maintainability
Karthik Shivashankar, Antonio Martini
TL;DR
This work addresses the gap in maintainingability for Python code generated by Code LLMs by fine-tuning models on a dedicated maintainability-focused dataset. It combines instruction tuning, parameter-efficient fine-tuning (PEFT) with QLoRA, and both open and closed models (WizardCoder13B and GPT-3.5) to generate refactored code that preserves functionality while reducing size and complexity, as measured by $SLOC$, $CC$, $HE$, and $MI$. Evaluation uses CodeBERTScore to assess functional similarity and Radon-based metrics for maintainability, supplemented by human judgments from 11 Python-expert participants. The results show measurable improvements in maintainability metrics and high functional similarity between generated and reference code, indicating that targeted maintainability objectives can be successfully integrated into AI-assisted code generation. The publicly released replication package and dataset underscore the practical impact for researchers and practitioners aiming to reduce technical debt and facilitate sustainable software development with AI-assisted tooling.
Abstract
This study aims to enhance the maintainability of code generated by Large Language Models (LLMs), with a focus on the Python programming language. As the use of LLMs for coding assistance grows, so do concerns about the maintainability of the code they produce. Previous research has mainly concentrated on the functional accuracy and testing success of generated code, overlooking aspects of maintainability. Our approach involves the use of a specifically designed dataset for training and evaluating the model, ensuring a thorough assessment of code maintainability. At the heart of our work is the fine-tuning of an LLM for code refactoring, aimed at enhancing code readability, reducing complexity, and improving overall maintainability. After fine-tuning an LLM to prioritize code maintainability, our evaluations indicate that this model significantly improves code maintainability standards, suggesting a promising direction for the future of AI-assisted software development.
