An Exploratory Eye Tracking Study on How Developers Classify and Debug Python Code in Different Paradigms
Samuel W. Flint, Jigyasa Chauhan, Niloofar Mansoor, Bonita Sharif, Robert Dyer
TL;DR
The paper investigates how developers read and reason about Python code written in different paradigms (Functional, Object-Oriented, Procedural) using an exploratory eye-tracking study. Participants completed paradigm-classification and bug-localization tasks while their gaze was mapped to code tokens and analyzed with fixation, saccade, and linearity metrics, complemented by scarf-plot visualizations. Key findings include confusion between Functional and Procedural paradigms, longer classification times for Functional code, and paradigm-by-program effects on debugging performance and reading patterns; gaze behavior tended to cover multiple token types, not solely paradigm-relevant ones. These results inform education, research, and practice by highlighting cognitive-load implications of multi-paradigm code and suggesting avenues for language design, teaching, and further theory development in program comprehension for Python. The work provides a foundation for a theory of mixed-paradigm comprehension and emphasizes the value of eye-tracking as a fine-grained lens on developer thought processes in multi-paradigm languages.
Abstract
Modern programming languages, such as Python, support language features from several paradigms, such as object-oriented, procedural, and functional. Research has shown that code written in some paradigms can be harder to comprehend, but to date, no research has looked at which paradigm-specific language features impact comprehension. To this end, this study seeks to uncover which paradigm-specific features impactcomprehension and debugging of code or how multi-paradigm code might affect a developer's ability to do so. We present an exploratory empirical eye-tracking study to investigate 1) how developers classify the predominant paradigm in Python code and 2) how the paradigm affects their ability to debug Python code. The goal is to uncover if specific language features are looked at more often while classifying and debugging code with a predominant paradigm. Twenty-nine developers (primarily students) were recruited for the study and were each given four classification and four debugging tasks in Python. Eye movements were recorded during all the tasks. The results indicate confusion in labeling Functional and Procedural paradigms, but not Object-Oriented. The code with predominantly functional paradigms also took the longest to complete. Changing the predominant paradigm did not affect the ability to debug the code, though developers did rate themselves with lower confidence for Functional code. We report significant differences in reading patterns during debugging, especially in the Functional code. During classification, results show that developers do not necessarily read paradigm-relevant token types.
