Table of Contents
Fetching ...

Follow Your Nose -- Which Code Smells are Worth Chasing?

Idan Amit, Nili Ben Ezra, Dror G. Feitelson

TL;DR

This study questions the assumed causality of code smells by evaluating whether fixing alert instances causally improves software quality or productivity. We operationalize causality through five properties—predictive power, monotonicity, co-change, and robustness to developer and file length—and test them on CCP and commit duration metrics across 31,687 Java files from 677 GitHub repos using CheckStyle alerts. Fewer than 20% of the 151 alerts meet the properties with respect to a target metric, with a small set of robust alerts indicating potential causal influence; however, overall precision and recall remain modest, signaling substantial noise and the need for careful prioritization. The findings imply that developers should focus on a concise subset of alerts (e.g., those related to simplicity, defensive programming, and abstraction) and that the majority of alerts may reflect correlation rather than true causation, guiding more efficient code-quality interventions and future causal experimentation.

Abstract

The common use case of code smells assumes causality: Identify a smell, remove it, and by doing so improve the code. We empirically investigate their fitness to this use. We present a list of properties that code smells should have if they indeed cause lower quality. We evaluated the smells in 31,687 Java files from 677 GitHub repositories, all the repositories with 200+ commits in 2019. We measured the influence of smells on four metrics for quality, productivity, and bug detection efficiency. Out of 151 code smells computed by the CheckStyle smell detector, less than 20% were found to be potentially causal, and only a handful are rather robust. The strongest smells deal with simplicity, defensive programming, and abstraction. Files without the potentially causal smells are 50% more likely to be of high quality. Unfortunately, most smells are not removed, and developers tend to remove the easy ones and not the effective ones.

Follow Your Nose -- Which Code Smells are Worth Chasing?

TL;DR

This study questions the assumed causality of code smells by evaluating whether fixing alert instances causally improves software quality or productivity. We operationalize causality through five properties—predictive power, monotonicity, co-change, and robustness to developer and file length—and test them on CCP and commit duration metrics across 31,687 Java files from 677 GitHub repos using CheckStyle alerts. Fewer than 20% of the 151 alerts meet the properties with respect to a target metric, with a small set of robust alerts indicating potential causal influence; however, overall precision and recall remain modest, signaling substantial noise and the need for careful prioritization. The findings imply that developers should focus on a concise subset of alerts (e.g., those related to simplicity, defensive programming, and abstraction) and that the majority of alerts may reflect correlation rather than true causation, guiding more efficient code-quality interventions and future causal experimentation.

Abstract

The common use case of code smells assumes causality: Identify a smell, remove it, and by doing so improve the code. We empirically investigate their fitness to this use. We present a list of properties that code smells should have if they indeed cause lower quality. We evaluated the smells in 31,687 Java files from 677 GitHub repositories, all the repositories with 200+ commits in 2019. We measured the influence of smells on four metrics for quality, productivity, and bug detection efficiency. Out of 151 code smells computed by the CheckStyle smell detector, less than 20% were found to be potentially causal, and only a handful are rather robust. The strongest smells deal with simplicity, defensive programming, and abstraction. Files without the potentially causal smells are 50% more likely to be of high quality. Unfortunately, most smells are not removed, and developers tend to remove the easy ones and not the effective ones.

Paper Structure

This paper contains 26 sections, 3 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The full arrows indicate the considered relations. The dashed lines indicate examples of unconsidered relations. $a_{i}$ - alert, $c_{j}$ - concept, Developer, Length, and Unknown.
  • Figure 2: The average CCP of median developers is very far from those in the extremes.
  • Figure 3: Median in solid, mean in dashed. Corrective commit ratio without MLE estimation of CCP since number of file commits is small.