Table of Contents
Fetching ...

No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic

John C. Flournoy, Carol S. Lee, Maggie Wu, Catherine M. Hicks

TL;DR

The paper investigates software delivery velocity by analyzing cycle time across 55,619 observations from 216 organizations using a Bayesian hierarchical Weibull model to separate within- and between-person variation. It jointly evaluates predictors such as coding days, total merged PRs, defect-ticket share, degree centrality, and comments per PR, introducing a novel collaboration metric while accounting for time- and organization-specific effects. Results show precise but modest associations with these factors and reveal substantial unexplained variability, indicating cycle time is a noisy, context-dependent signal that resists simple, individual-level interventions. The authors argue for systems-level thinking and longitudinal, multi-factor measurement to improve software delivery, and provide methodological guidance for analyzing complex operational metrics at scale, with implications for practitioners wary of over-interpreting single observations.

Abstract

Understanding factors that influence software development velocity is crucial for engineering teams and organizations, yet empirical evidence at scale remains limited. A more robust understanding of the dynamics of cycle time may help practitioners avoid pitfalls in relying on velocity measures while evaluating software work. We analyze cycle time, a widely-used metric measuring time from ticket creation to completion, using a dataset of over 55,000 observations across 216 organizations. Through Bayesian hierarchical modeling that appropriately separates individual and organizational variation, we examine how coding time, task scoping, and collaboration patterns affect cycle time while characterizing its substantial variability across contexts. We find precise but modest associations between cycle time and factors including coding days per week, number of merged pull requests, and degree of collaboration. However, these effects are set against considerable unexplained variation both between and within individuals. Our findings suggest that while common workplace factors do influence cycle time in expected directions, any single observation provides limited signal about typical performance. This work demonstrates methods for analyzing complex operational metrics at scale while highlighting potential pitfalls in using such measurements to drive decision-making. We conclude that improving software delivery velocity likely requires systems-level thinking rather than individual-focused interventions.

No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic

TL;DR

The paper investigates software delivery velocity by analyzing cycle time across 55,619 observations from 216 organizations using a Bayesian hierarchical Weibull model to separate within- and between-person variation. It jointly evaluates predictors such as coding days, total merged PRs, defect-ticket share, degree centrality, and comments per PR, introducing a novel collaboration metric while accounting for time- and organization-specific effects. Results show precise but modest associations with these factors and reveal substantial unexplained variability, indicating cycle time is a noisy, context-dependent signal that resists simple, individual-level interventions. The authors argue for systems-level thinking and longitudinal, multi-factor measurement to improve software delivery, and provide methodological guidance for analyzing complex operational metrics at scale, with implications for practitioners wary of over-interpreting single observations.

Abstract

Understanding factors that influence software development velocity is crucial for engineering teams and organizations, yet empirical evidence at scale remains limited. A more robust understanding of the dynamics of cycle time may help practitioners avoid pitfalls in relying on velocity measures while evaluating software work. We analyze cycle time, a widely-used metric measuring time from ticket creation to completion, using a dataset of over 55,000 observations across 216 organizations. Through Bayesian hierarchical modeling that appropriately separates individual and organizational variation, we examine how coding time, task scoping, and collaboration patterns affect cycle time while characterizing its substantial variability across contexts. We find precise but modest associations between cycle time and factors including coding days per week, number of merged pull requests, and degree of collaboration. However, these effects are set against considerable unexplained variation both between and within individuals. Our findings suggest that while common workplace factors do influence cycle time in expected directions, any single observation provides limited signal about typical performance. This work demonstrates methods for analyzing complex operational metrics at scale while highlighting potential pitfalls in using such measurements to drive decision-making. We conclude that improving software delivery velocity likely requires systems-level thinking rather than individual-focused interventions.

Paper Structure

This paper contains 40 sections, 2 equations, 18 figures.

Figures (18)

  • Figure 1: Organization sizes clustered around 130 users, with a long tail of larger organizations. Note that "users" generally refers to developers or other individuals creating and closing tickets.
  • Figure 2: Within-quarter month doesn't affect cycle time. Background pixels represent density of data, with darker colors indicating greater density. Lines are median posterior expectations, with 95% credible interval ribbons.
  • Figure 3: Slight reduction of cycle time across the year. Background hexagons represent density of data, with darker colors indicating greater density. Lines are median posterior expectations, with 95% credible interval ribbons.
  • Figure 4: More coding days is associated with shorter cycle times. Background hexagons represent density of data, with darker colors indicating greater density. Lines are median posterior expectations, with 95% credible interval ribbons.
  • Figure 5: More merged PRs is associated with shorter cycle times. Background hexagons represent density of data, with darker colors indicating greater density. Lines are median posterior expectations, with 95% credible interval ribbons.
  • ...and 13 more figures