Table of Contents
Fetching ...

Reproducing, Extending, and Analyzing Naming Experiments

Rachel Alpern, Ido Lazer, Issar Tzachor, Hanit Hakim, Sapir Weissbuch, Dror G. Feitelson

TL;DR

This work replicates and extends Feitelson et al.'s naming experiments by renaming variables in code rather than describing scenarios, using Hebrew descriptions to mitigate accessibility bias, and adding a deeper semantic analysis of labeled concepts. It demonstrates that name reuse remains low and diversity high across multiple experiments, and that the 3-step naming model improves external judgments of name quality whereas merely instructing developers to make names longer does not. The authors further classify naming concepts into universal, correlated, alternative, optional, and rare, revealing how concept selection shapes name structure and length. Practically, the results support targeted training on the 3-step model to enhance naming quality and suggest diagnostics for identifying naming pain-points in real projects.

Abstract

Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.

Reproducing, Extending, and Analyzing Naming Experiments

TL;DR

This work replicates and extends Feitelson et al.'s naming experiments by renaming variables in code rather than describing scenarios, using Hebrew descriptions to mitigate accessibility bias, and adding a deeper semantic analysis of labeled concepts. It demonstrates that name reuse remains low and diversity high across multiple experiments, and that the 3-step naming model improves external judgments of name quality whereas merely instructing developers to make names longer does not. The authors further classify naming concepts into universal, correlated, alternative, optional, and rare, revealing how concept selection shapes name structure and length. Practically, the results support targeted training on the 3-step model to enhance naming quality and suggest diagnostics for identifying naming pain-points in real projects.

Abstract

Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.
Paper Structure (27 sections, 10 figures, 6 tables)

This paper contains 27 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Examples of names and their analysis into concepts, for the GLOBAL_C constant in Experiment 1. This constant represents the points awarded for each correct answer in the quiz. The full results included many more names (including repetitions) and a few more rare concepts.
  • Figure 2: Cumulative distribution functions of focus and diversity of names given to all variables in both experiments. The dashed lines are the equivalent results from the original study (Figure 6 of feitelson22).
  • Figure 3: Cumulative distribution function of $P2hit$ for all variables in both experiments. The dashed line is the equivalent result from the original study (Figure 8 of feitelson22).
  • Figure 4: Histograms of variable name lengths for all variables in both experiments, with partitioning into words. Compare with Figure 4 of feitelson22.
  • Figure 5: CDFs of variable name lengths for experienced developers as opposed to inexperienced students. $N$ refers to the number of names given. Compare with Figure 5 of feitelson22.
  • ...and 5 more figures