Table of Contents
Fetching ...

Revisiting Process versus Product Metrics: a Large Scale Analysis

Suvodeep Majumder, Pranav Mody, Tim Menzies

TL;DR

This paper addresses whether conclusions from analytics in-the-small hold when scaling up to analytics-in-the-large for defect prediction. It conducts a large-scale study across 700 GitHub Java projects (722,471 commits), comparing process and product metrics using four learners with ensemble methods and two evaluation schemes. The findings largely confirm that process metrics provide stronger defect signals, while metric importance rankings shift at scale and single-model approaches falter, favoring ensemble predictors. The work emphasizes the need for large-scale validation of prior results and suggests integrating qualitative and quantitative methods to derive robust, scalable software analytics guidance.

Abstract

Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

Revisiting Process versus Product Metrics: a Large Scale Analysis

TL;DR

This paper addresses whether conclusions from analytics in-the-small hold when scaling up to analytics-in-the-large for defect prediction. It conducts a large-scale study across 700 GitHub Java projects (722,471 commits), comparing process and product metrics using four learners with ensemble methods and two evaluation schemes. The findings largely confirm that process metrics provide stronger defect signals, while metric importance rankings shift at scale and single-model approaches falter, favoring ensemble predictors. The work emphasizes the need for large-scale validation of prior results and suggests integrating qualitative and quantitative methods to derive robust, scalable software analytics guidance.

Abstract

Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

Paper Structure

This paper contains 19 sections, 2 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Number of papers exploring the benefits of the process and product metrics for defect prediction. The papers in the intersection are rahman2013andmoser2008comparativegraves2000predictingarisholm2010systematickamei2010revisitinggiger2012method explore and compare both process and product metrics. Note that prior to this EMSE paper, prior work that looked at the process and product metrics explored analytics-in-the-small.
  • Figure 2: Differential Evolution based on Storn' s DE optimizer.
  • Figure 3: Framework for this analysis.
  • Figure 4: Cross-validation recall and false alarm results for Process(P), Product(C) and, Combined (P+C) metrics. The vertical box plots in these charts run from min to max while the thick boxes highlight the 25,50,75th percentile. Each box plot is built using 700 Github projects, where each data point is the(a) median result from 5-fold cross-validation repeated 5 times.
  • Figure 5: Cross-validation AUC and Popt20 results for Process(P), Product(C), and Combined (P+C) metrics. Same format as Figure \ref{['fig:learner_performance_1']}.
  • ...and 9 more figures