No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic
John C. Flournoy, Carol S. Lee, Maggie Wu, Catherine M. Hicks
TL;DR
The paper investigates software delivery velocity by analyzing cycle time across 55,619 observations from 216 organizations using a Bayesian hierarchical Weibull model to separate within- and between-person variation. It jointly evaluates predictors such as coding days, total merged PRs, defect-ticket share, degree centrality, and comments per PR, introducing a novel collaboration metric while accounting for time- and organization-specific effects. Results show precise but modest associations with these factors and reveal substantial unexplained variability, indicating cycle time is a noisy, context-dependent signal that resists simple, individual-level interventions. The authors argue for systems-level thinking and longitudinal, multi-factor measurement to improve software delivery, and provide methodological guidance for analyzing complex operational metrics at scale, with implications for practitioners wary of over-interpreting single observations.
Abstract
Understanding factors that influence software development velocity is crucial for engineering teams and organizations, yet empirical evidence at scale remains limited. A more robust understanding of the dynamics of cycle time may help practitioners avoid pitfalls in relying on velocity measures while evaluating software work. We analyze cycle time, a widely-used metric measuring time from ticket creation to completion, using a dataset of over 55,000 observations across 216 organizations. Through Bayesian hierarchical modeling that appropriately separates individual and organizational variation, we examine how coding time, task scoping, and collaboration patterns affect cycle time while characterizing its substantial variability across contexts. We find precise but modest associations between cycle time and factors including coding days per week, number of merged pull requests, and degree of collaboration. However, these effects are set against considerable unexplained variation both between and within individuals. Our findings suggest that while common workplace factors do influence cycle time in expected directions, any single observation provides limited signal about typical performance. This work demonstrates methods for analyzing complex operational metrics at scale while highlighting potential pitfalls in using such measurements to drive decision-making. We conclude that improving software delivery velocity likely requires systems-level thinking rather than individual-focused interventions.
