Table of Contents
Fetching ...

Towards Identifying Code Proficiency through the Analysis of Python Textbooks

Ruksit Rojpaisarnkit, Gregorio Robles, Raula Gaikovina Kula, Dong Wang, Chaiyong Ragkhitwetsagul, Jesus M. Gonzalez-Barahona, Kenichi Matsumoto

TL;DR

The paper investigates whether the sequence of Python constructs taught in introductory textbooks aligns with competency levels assigned by pycefr. It employs regex-based mining to identify 94 constructs across 12 textbooks, followed by manual validation and quantitative distance analyses to assess alignment. Results show substantial, but imperfect, alignment, with larger disagreements for higher-level constructs and several cases where advanced concepts appear earlier than pycefr would suggest. These findings validate using textbooks as a basis for assessing developer proficiency while highlighting data-noise challenges and the need to refine pycefr classifications for practical maintenance and AI-assisted coding contexts.

Abstract

Python, one of the most prevalent programming languages today, is widely utilized in various domains, including web development, data science, machine learning, and DevOps. Recent scholarly efforts have proposed a methodology to assess Python competence levels, similar to how proficiency in natural languages is evaluated. This method involves assigning levels of competence to Python constructs, for instance, placing simple 'print' statements at the most basic level and abstract base classes at the most advanced. The aim is to gauge the level of proficiency a developer must have to understand a piece of source code. This is particularly crucial for software maintenance and evolution tasks, such as debugging or adding new features. For example, in a code review process, this method could determine the competence level required for reviewers. However, categorizing Python constructs by proficiency levels poses significant challenges. Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies. In response, this paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks. By comparing the sequence in which Python constructs are introduced in these textbooks with the current state of the art, we have uncovered notable discrepancies in the order of introduction of Python constructs. Our study underscores a misalignment in the sequences, demonstrating that pinpointing proficiency levels is not trivial. Insights from the study serve as pivotal steps toward reinforcing the idea that textbooks serve as a valuable source for evaluating developers' proficiency, and particularly in terms of their ability to undertake maintenance and evolution tasks.

Towards Identifying Code Proficiency through the Analysis of Python Textbooks

TL;DR

The paper investigates whether the sequence of Python constructs taught in introductory textbooks aligns with competency levels assigned by pycefr. It employs regex-based mining to identify 94 constructs across 12 textbooks, followed by manual validation and quantitative distance analyses to assess alignment. Results show substantial, but imperfect, alignment, with larger disagreements for higher-level constructs and several cases where advanced concepts appear earlier than pycefr would suggest. These findings validate using textbooks as a basis for assessing developer proficiency while highlighting data-noise challenges and the need to refine pycefr classifications for practical maintenance and AI-assisted coding contexts.

Abstract

Python, one of the most prevalent programming languages today, is widely utilized in various domains, including web development, data science, machine learning, and DevOps. Recent scholarly efforts have proposed a methodology to assess Python competence levels, similar to how proficiency in natural languages is evaluated. This method involves assigning levels of competence to Python constructs, for instance, placing simple 'print' statements at the most basic level and abstract base classes at the most advanced. The aim is to gauge the level of proficiency a developer must have to understand a piece of source code. This is particularly crucial for software maintenance and evolution tasks, such as debugging or adding new features. For example, in a code review process, this method could determine the competence level required for reviewers. However, categorizing Python constructs by proficiency levels poses significant challenges. Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies. In response, this paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks. By comparing the sequence in which Python constructs are introduced in these textbooks with the current state of the art, we have uncovered notable discrepancies in the order of introduction of Python constructs. Our study underscores a misalignment in the sequences, demonstrating that pinpointing proficiency levels is not trivial. Insights from the study serve as pivotal steps toward reinforcing the idea that textbooks serve as a valuable source for evaluating developers' proficiency, and particularly in terms of their ability to undertake maintenance and evolution tasks.
Paper Structure (12 sections, 7 figures, 6 tables)

This paper contains 12 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Research Architecture
  • Figure 2: Distribution of the number of Python code constructs per textbook. Constructs are colored and stacked according to their competency level.
  • Figure 3: Distribution of the number of books in which Python constructs appear. Constructs are colored and stacked according to their competency level.
  • Figure 4: Distribution of code construct introduction in each level
  • Figure 5: zip and enumerate functions in book 4.
  • ...and 2 more figures