Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot

Beiqi Zhang; Peng Liang; Qiong Feng; Yujia Fu; Zengyang Li

Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot

Beiqi Zhang, Peng Liang, Qiong Feng, Yujia Fu, Zengyang Li

TL;DR

This work investigates code smells in Copilot-generated Python code and evaluates Copilot Chat as a fix mechanism. It builds a dataset of 102 smells detected in 311 Copilot-generated Python files from GitHub using Pysmell, revealing 14.8% prevalence with Multiply-Nested Container as the dominant smell. The authors test three Copilot Chat prompts to fix smells, finding that the Specific Code Smell Fix Prompt achieves the highest average fixing rate of 87.1%, while general prompts perform significantly worse; some smells like LLF and LC can be fixed at 100% under certain prompts. The results indicate that Copilot Chat is a promising tool for automated smell remediation in AI-generated code, with effectiveness strongly boosted by providing detailed, smell-specific guidance, and point to future work extending to other languages and integrating with complementary analysis tools.

Abstract

As one of the most popular dynamic languages, Python experiences a decrease in readability and maintainability when code smells are present. Recent advancements in Large Language Models have sparked growing interest in AI-enabled tools for both code generation and refactoring. GitHub Copilot is one such tool that has gained widespread usage. Copilot Chat, released in September 2023, functions as an interactive tool aimed at facilitating natural language-powered coding. However, limited attention has been given to understanding code smells in Copilot-generated Python code and Copilot Chat's ability to fix the code smells. To this end, we built a dataset comprising 102 code smells in Copilot-generated Python code. Our aim is to first explore the occurrence of code smells in Copilot-generated Python code and then evaluate the effectiveness of Copilot Chat in fixing these code smells employing different prompts. The results show that 8 out of 10 types of code smells can be detected in Copilot-generated Python code, among which Multiply-Nested Container is the most common one. For these code smells, Copilot Chat achieves a highest fixing rate of 87.1%, showing promise in fixing Python code smells generated by Copilot itself. In addition, the effectiveness of Copilot Chat in fixing these smells can be improved by providing more detailed prompts.

Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot

TL;DR

Abstract

Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot

Authors

TL;DR

Abstract

Table of Contents

Figures (2)