Table of Contents
Fetching ...

Assessing Python Style Guides: An Eye-Tracking Study with Novice Developers

Pablo Roberto, Rohit Gheyi, José Aldo Silva da Costa, Márcio Ribeiro

TL;DR

This study tackles the problem of empirically validating Python style guidelines by evaluating four PEP8 patterns through eye-tracking with 32 novice developers. Using a controlled Latin Square design, the authors measure task time, attempt count, fixation metrics, and regressions across compliant versus non-compliant code, complemented by interviews and surveys. Key findings include strong improvements in AOI time and fixation metrics for the Line Break Before Operator and Multiple Clauses guidelines when followed, yet weaker or mixed effects for White Space and Comparison to True, with subject preferences sometimes diverging from objective metrics. The work demonstrates that empirical data should inform guideline adoption and suggests extensions to education and IDE support, highlighting the practical impact of eye-tracking-informed style guidelines for novice programmers.

Abstract

The incorporation and adaptation of style guides play an essential role in software development, influencing code formatting, naming conventions, and structure to enhance readability and simplify maintenance. However, many of these guides often lack empirical studies to validate their recommendations. Previous studies have examined the impact of code styles on developer performance, concluding that some styles have a negative impact on code readability. However, there is a need for more studies that assess other perspectives and the combination of these perspectives on a common basis through experiments. This study aimed to investigate, through eye-tracking, the impact of guidelines in style guides, with a special focus on the PEP8 guide in Python, recognized for its best practices. We conducted a controlled experiment with 32 Python novices, measuring time, the number of attempts, and visual effort through eye-tracking, using fixation duration, fixation count, and regression count for four PEP8 recommendations. Additionally, we conducted interviews to explore the subjects' difficulties and preferences with the programs. The results highlighted that not following the PEP8 Line Break after an Operator guideline increased the eye regression count by 70% in the code snippet where the standard should have been applied. Most subjects preferred the version that adhered to the PEP8 guideline, and some found the left-aligned organization of operators easier to understand. The other evaluated guidelines revealed other interesting nuances, such as the True Comparison, which negatively impacted eye metrics for the PEP8 standard, although subjects preferred the PEP8 suggestion. We recommend practitioners selecting guidelines supported by experimental evaluations.

Assessing Python Style Guides: An Eye-Tracking Study with Novice Developers

TL;DR

This study tackles the problem of empirically validating Python style guidelines by evaluating four PEP8 patterns through eye-tracking with 32 novice developers. Using a controlled Latin Square design, the authors measure task time, attempt count, fixation metrics, and regressions across compliant versus non-compliant code, complemented by interviews and surveys. Key findings include strong improvements in AOI time and fixation metrics for the Line Break Before Operator and Multiple Clauses guidelines when followed, yet weaker or mixed effects for White Space and Comparison to True, with subject preferences sometimes diverging from objective metrics. The work demonstrates that empirical data should inform guideline adoption and suggests extensions to education and IDE support, highlighting the practical impact of eye-tracking-informed style guidelines for novice programmers.

Abstract

The incorporation and adaptation of style guides play an essential role in software development, influencing code formatting, naming conventions, and structure to enhance readability and simplify maintenance. However, many of these guides often lack empirical studies to validate their recommendations. Previous studies have examined the impact of code styles on developer performance, concluding that some styles have a negative impact on code readability. However, there is a need for more studies that assess other perspectives and the combination of these perspectives on a common basis through experiments. This study aimed to investigate, through eye-tracking, the impact of guidelines in style guides, with a special focus on the PEP8 guide in Python, recognized for its best practices. We conducted a controlled experiment with 32 Python novices, measuring time, the number of attempts, and visual effort through eye-tracking, using fixation duration, fixation count, and regression count for four PEP8 recommendations. Additionally, we conducted interviews to explore the subjects' difficulties and preferences with the programs. The results highlighted that not following the PEP8 Line Break after an Operator guideline increased the eye regression count by 70% in the code snippet where the standard should have been applied. Most subjects preferred the version that adhered to the PEP8 guideline, and some found the left-aligned organization of operators easier to understand. The other evaluated guidelines revealed other interesting nuances, such as the True Comparison, which negatively impacted eye metrics for the PEP8 standard, although subjects preferred the PEP8 suggestion. We recommend practitioners selecting guidelines supported by experimental evaluations.
Paper Structure (20 sections, 10 figures, 2 tables)

This paper contains 20 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Experiment Steps: Questionnaire, Warm-up, Task, Interview, and Survey.
  • Figure 2: Latin Square Structure. Each subject received four programs (P$_1$-P$_4$), which were PEP8 compliant (C). These programs belonged to Program Set 1 (SP$_1$). Additionally, the subject received four programs (P$_5$-P$_8$) from Program Set 2 (SP$_2$), comprising PEP8 non-compliant (NC).
  • Figure 3: Programs presented to the participants.
  • Figure 4: Subject's preferences for the PEP8 compliant and PEP8 non-compliant versions of the PEP8 guidelines. We used the following acronyms: Strongly Prefers PEP8 compliant (SPC); Prefers PEP8 compliant (PC); Indifferent (I); Prefers PEP8 non-compliant (PNC); Strongly Prefers PEP8 non-compliant (SPNC).
  • Figure 5: Eye regression and progression of reading in the AOI of the White Space pattern for the PEP8 non-compliant version (left-hand side) and PEP8 compliant version (right-hand side) of a subject. The green region contains the portion of the code where the PEP8 pattern has been applied or not. Arrows pointing from right to left represent reading progression, and from left to right, eye returning to the previous region of the code. The numbers on the arrows represent the number of times progression or regression occurred.
  • ...and 5 more figures