Table of Contents
Fetching ...

Exploring the Impact of Code Style in Identifying Good Programmers

Rafed Muhammad Yasir, Ahmedul Kabir

TL;DR

The paper tackles whether code style can identify good programmers by analyzing GCJ solutions with 30 stylistic features using clustering and supervised learning. While no particular style cluster correlates with high ability, supervised models—especially Balanced Random Forest—achieve modest predictive performance (recall ~0.65, macro-F1 ~0.511, AUC-ROC ~0.695), indicating a nontrivial signal in stylistic features. This work demonstrates a potential, albeit limited, role for code style in recruitment and quality assurance, and highlights avenues for building style-guided guidelines and further methodological refinements. The findings motivate broader exploration of stylistic cues across languages and coding contexts to enhance software quality assessment and hiring practices.

Abstract

Code style is an aesthetic choice exhibited in source code that reflects programmers individual coding habits. This study is the first to investigate whether code style can be used as an indicator to identify good programmers. Data from Google Code Jam was chosen for conducting the study. A cluster analysis was performed to find whether a particular coding style could be associated with good programmers. Furthermore, supervised machine learning models were trained using stylistic features and evaluated using recall, macro-F1, AUC-ROC and balanced accuracy to predict good programmers. The results demonstrate that good programmers may be identified using supervised machine learning models, despite that no particular style groups could be attributed as a good style.

Exploring the Impact of Code Style in Identifying Good Programmers

TL;DR

The paper tackles whether code style can identify good programmers by analyzing GCJ solutions with 30 stylistic features using clustering and supervised learning. While no particular style cluster correlates with high ability, supervised models—especially Balanced Random Forest—achieve modest predictive performance (recall ~0.65, macro-F1 ~0.511, AUC-ROC ~0.695), indicating a nontrivial signal in stylistic features. This work demonstrates a potential, albeit limited, role for code style in recruitment and quality assurance, and highlights avenues for building style-guided guidelines and further methodological refinements. The findings motivate broader exploration of stylistic cues across languages and coding contexts to enhance software quality assessment and hiring practices.

Abstract

Code style is an aesthetic choice exhibited in source code that reflects programmers individual coding habits. This study is the first to investigate whether code style can be used as an indicator to identify good programmers. Data from Google Code Jam was chosen for conducting the study. A cluster analysis was performed to find whether a particular coding style could be associated with good programmers. Furthermore, supervised machine learning models were trained using stylistic features and evaluated using recall, macro-F1, AUC-ROC and balanced accuracy to predict good programmers. The results demonstrate that good programmers may be identified using supervised machine learning models, despite that no particular style groups could be attributed as a good style.
Paper Structure (12 sections, 3 equations, 14 figures, 4 tables)

This paper contains 12 sections, 3 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Style 1
  • Figure 2: Style 2
  • Figure 4: Qualification round - 565238
  • Figure 5: Qualification round - 563469
  • Figure 6: Qualification round - 563631
  • ...and 9 more figures