Table of Contents
Fetching ...

How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries

Indira Febriyanti, Youmei Fan, Kazumasa Shimari, Kenichi Matsumoto, Raula Gaikovina Kula

TL;DR

This study investigates how code proficiency relates to maintainability risk in Python by analyzing 3,003 files from three PyPI libraries (fpdf2, mpmath, PyG) using PyCEFR to label proficiency levels ($C1$ and $C2$) and Radon to compute cyclomatic complexity ($CC$). Proficiency lines are mapped to Safe ($A$) versus Risky ($F$) categories, yielding 2,836 connections for analysis. The findings indicate that the majority of proficient code is associated with low maintainability risk (Advance-Safe ~46.61%, Mastery-Safe ~50.25%), with small fractions of high-risk cases (Advance-Risky ~2.26%, Mastery-Risky ~0.85%), and highlight specific high-risk patterns such as simple list comprehensions and the use of enumerate, generator expressions, and the super call. Overall, proficient code tends to be maintainable, but notable exceptions exist, motivating guidance on when advanced constructs might hinder future maintenance and suggesting avenues for broader validation across more projects.

Abstract

Python is very popular because it can be used for a wider audience of developers, data scientists, machine learning experts and so on. Like other programming languages, there are beginner to advanced levels of writing Python code. However, like all software, code constantly needs to be maintained as bugs and the need for new features emerge. Although the Zen of Python states that "Simple is better than complex," we hypothesize that more elegant and proficient code might be harder for the developer to maintain. To study this relationship between the understanding of code maintainability and code proficiency, we present an exploratory study into the complexity of Python code on three Python libraries. Specifically, we investigate the risk level of proficient code inside a file. As a starting point, we mined and collected the proficiency of code from three PyPI libraries totaling 3,003 files. We identified several instances of high proficient code that was also high risk, with examples being simple list comprehensions, 'enumerate' calls, generator expressions, simple dictionary comprehensions, and the 'super' function. Our early examples revealed that most code-proficient development presented a low maintainability risk, yet there are some cases where proficient code is also risky to maintenance. We envision that the study should help developers identify scenarios where and when using proficient code might be detrimental to future code maintenance activities.

How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries

TL;DR

This study investigates how code proficiency relates to maintainability risk in Python by analyzing 3,003 files from three PyPI libraries (fpdf2, mpmath, PyG) using PyCEFR to label proficiency levels ( and ) and Radon to compute cyclomatic complexity (). Proficiency lines are mapped to Safe () versus Risky () categories, yielding 2,836 connections for analysis. The findings indicate that the majority of proficient code is associated with low maintainability risk (Advance-Safe ~46.61%, Mastery-Safe ~50.25%), with small fractions of high-risk cases (Advance-Risky ~2.26%, Mastery-Risky ~0.85%), and highlight specific high-risk patterns such as simple list comprehensions and the use of enumerate, generator expressions, and the super call. Overall, proficient code tends to be maintainable, but notable exceptions exist, motivating guidance on when advanced constructs might hinder future maintenance and suggesting avenues for broader validation across more projects.

Abstract

Python is very popular because it can be used for a wider audience of developers, data scientists, machine learning experts and so on. Like other programming languages, there are beginner to advanced levels of writing Python code. However, like all software, code constantly needs to be maintained as bugs and the need for new features emerge. Although the Zen of Python states that "Simple is better than complex," we hypothesize that more elegant and proficient code might be harder for the developer to maintain. To study this relationship between the understanding of code maintainability and code proficiency, we present an exploratory study into the complexity of Python code on three Python libraries. Specifically, we investigate the risk level of proficient code inside a file. As a starting point, we mined and collected the proficiency of code from three PyPI libraries totaling 3,003 files. We identified several instances of high proficient code that was also high risk, with examples being simple list comprehensions, 'enumerate' calls, generator expressions, simple dictionary comprehensions, and the 'super' function. Our early examples revealed that most code-proficient development presented a low maintainability risk, yet there are some cases where proficient code is also risky to maintenance. We envision that the study should help developers identify scenarios where and when using proficient code might be detrimental to future code maintenance activities.
Paper Structure (3 sections, 1 figure, 2 tables)

This paper contains 3 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Correlation Scores of three PyPI libraries