NRevisit: A Cognitive Behavioral Metric for Code Understandability Assessment
Gao Hao, Haytham Hijazi, Júlio Medeiros, João Durães, Chan Tong Lam, Paulo de Carvalho, Henrique Madeira
TL;DR
This work addresses the gap between static code complexity metrics and the real-time cognitive effort developers experience when understanding code. It introduces NRevisit, a dynamic metric derived from gaze revisits across invisible code regions, with two variants: $C$ NRevisit and $CL$ NRevisit. In a controlled study with 35 programmers and EEG ground truth, NRevisit shows very high correlations with cognitive load ($r_s$ up to $0.986$) and outperforms static metrics in predictive models, especially in non-linear frameworks like neural networks and Gaussian process regression. The findings suggest practical, low-cost integration into IDEs and AI-assisted programming tools, enabling programmer-specific, real-time assessments of code understandability and targeted interventions for cognitively demanding regions.
Abstract
Measuring code understandability is both highly relevant and exceptionally challenging. This paper proposes a dynamic code understandability assessment method, which estimates a personalized code understandability score from the perspective of the specific programmer handling the code. The method consists of dynamically dividing the code unit under development or review in code regions (invisible to the programmer) and using the number of revisits (NRevisit) to each region as the primary feature for estimating the code understandability score. This approach removes the uncertainty related to the concept of a "typical programmer" assumed by static software code complexity metrics and can be easily implemented using a simple, low-cost, and non-intrusive desktop eye tracker or even a standard computer camera. This metric was evaluated using cognitive load measured through electroencephalography (EEG) in a controlled experiment with 35 programmers. Results show a very high correlation ranging from rs = 0.9067 to rs = 0.9860 (with p nearly 0) between the scores obtained with different alternatives of NRevisit and the ground truth represented by the EEG measurements of programmers' cognitive load, demonstrating the effectiveness of our approach in reflecting the cognitive effort required for code comprehension. The paper also discusses possible practical applications of NRevisit, including its use in the context of AI-generated code, which is already widely used today.
