Table of Contents
Fetching ...

An Exploratory Eye Tracking Study on How Developers Classify and Debug Python Code in Different Paradigms

Samuel W. Flint, Jigyasa Chauhan, Niloofar Mansoor, Bonita Sharif, Robert Dyer

TL;DR

The paper investigates how developers read and reason about Python code written in different paradigms (Functional, Object-Oriented, Procedural) using an exploratory eye-tracking study. Participants completed paradigm-classification and bug-localization tasks while their gaze was mapped to code tokens and analyzed with fixation, saccade, and linearity metrics, complemented by scarf-plot visualizations. Key findings include confusion between Functional and Procedural paradigms, longer classification times for Functional code, and paradigm-by-program effects on debugging performance and reading patterns; gaze behavior tended to cover multiple token types, not solely paradigm-relevant ones. These results inform education, research, and practice by highlighting cognitive-load implications of multi-paradigm code and suggesting avenues for language design, teaching, and further theory development in program comprehension for Python. The work provides a foundation for a theory of mixed-paradigm comprehension and emphasizes the value of eye-tracking as a fine-grained lens on developer thought processes in multi-paradigm languages.

Abstract

Modern programming languages, such as Python, support language features from several paradigms, such as object-oriented, procedural, and functional. Research has shown that code written in some paradigms can be harder to comprehend, but to date, no research has looked at which paradigm-specific language features impact comprehension. To this end, this study seeks to uncover which paradigm-specific features impactcomprehension and debugging of code or how multi-paradigm code might affect a developer's ability to do so. We present an exploratory empirical eye-tracking study to investigate 1) how developers classify the predominant paradigm in Python code and 2) how the paradigm affects their ability to debug Python code. The goal is to uncover if specific language features are looked at more often while classifying and debugging code with a predominant paradigm. Twenty-nine developers (primarily students) were recruited for the study and were each given four classification and four debugging tasks in Python. Eye movements were recorded during all the tasks. The results indicate confusion in labeling Functional and Procedural paradigms, but not Object-Oriented. The code with predominantly functional paradigms also took the longest to complete. Changing the predominant paradigm did not affect the ability to debug the code, though developers did rate themselves with lower confidence for Functional code. We report significant differences in reading patterns during debugging, especially in the Functional code. During classification, results show that developers do not necessarily read paradigm-relevant token types.

An Exploratory Eye Tracking Study on How Developers Classify and Debug Python Code in Different Paradigms

TL;DR

The paper investigates how developers read and reason about Python code written in different paradigms (Functional, Object-Oriented, Procedural) using an exploratory eye-tracking study. Participants completed paradigm-classification and bug-localization tasks while their gaze was mapped to code tokens and analyzed with fixation, saccade, and linearity metrics, complemented by scarf-plot visualizations. Key findings include confusion between Functional and Procedural paradigms, longer classification times for Functional code, and paradigm-by-program effects on debugging performance and reading patterns; gaze behavior tended to cover multiple token types, not solely paradigm-relevant ones. These results inform education, research, and practice by highlighting cognitive-load implications of multi-paradigm code and suggesting avenues for language design, teaching, and further theory development in program comprehension for Python. The work provides a foundation for a theory of mixed-paradigm comprehension and emphasizes the value of eye-tracking as a fine-grained lens on developer thought processes in multi-paradigm languages.

Abstract

Modern programming languages, such as Python, support language features from several paradigms, such as object-oriented, procedural, and functional. Research has shown that code written in some paradigms can be harder to comprehend, but to date, no research has looked at which paradigm-specific language features impact comprehension. To this end, this study seeks to uncover which paradigm-specific features impactcomprehension and debugging of code or how multi-paradigm code might affect a developer's ability to do so. We present an exploratory empirical eye-tracking study to investigate 1) how developers classify the predominant paradigm in Python code and 2) how the paradigm affects their ability to debug Python code. The goal is to uncover if specific language features are looked at more often while classifying and debugging code with a predominant paradigm. Twenty-nine developers (primarily students) were recruited for the study and were each given four classification and four debugging tasks in Python. Eye movements were recorded during all the tasks. The results indicate confusion in labeling Functional and Procedural paradigms, but not Object-Oriented. The code with predominantly functional paradigms also took the longest to complete. Changing the predominant paradigm did not affect the ability to debug the code, though developers did rate themselves with lower confidence for Functional code. We report significant differences in reading patterns during debugging, especially in the Functional code. During classification, results show that developers do not necessarily read paradigm-relevant token types.

Paper Structure

This paper contains 48 sections, 11 figures, 14 tables.

Figures (11)

  • Figure 1: The Cube program for the bug localization task category, shown in all four paradigms. Note that each file had a two-line header describing the task ("Find a logical bug in the code"), and the purpose of the code ("The following code prints the cubed values of a list of numbers").
  • Figure 2: Average time on task, per paradigm, for classification, including size.
  • Figure 3: Scarf plots showing fixations over tokens for code classification task, with participants labeled on the $y$-axis (participant number and correctness) and time (in ms) on the $x$-axis: Functional paradigm, Small code length
  • Figure 4: Scarf plots showing fixations over tokens for code classification task, with participants labeled on the $y$-axis (participant number and correctness) and time (in ms) on the $x$-axis: Object-Oriented paradigm, Medium code length
  • Figure 5: Scarf plots showing fixations over tokens for code classification task, with participants labeled on the $y$-axis (participant number and correctness) and time (in ms) on the $x$-axis: Procedural paradigm, Large code length
  • ...and 6 more figures