The dynamics of leadership and success in software development teams

Lorenzo Betti; Luca Gallo; Johannes Wachs; Federico Battiston

The dynamics of leadership and success in software development teams

Lorenzo Betti, Luca Gallo, Johannes Wachs, Federico Battiston

TL;DR

This work tackles how dynamic leadership and workload distribution shape success in open-source software teams, addressing the gap where teams are treated as static entities. Using fine-grained, temporal data from Rust, JavaScript, and Python repositories, the authors reveal a persistent, highly unequal workload distribution where a lead developer drives a large share of commits, and this heterogeneity correlates with higher success metrics such as stars and downloads. They show that lead developers perform core coordination tasks, and that a nontrivial fraction of projects experience a lead-change, which is associated with faster post-change growth in popularity and utility, especially under certain conditions such as the old lead’s prior experience. The findings are replicated across ecosystems with a robust matching approach and supplementary analyses, highlighting the broad relevance of team evolution for OSS outcomes and offering implications for coordination, risk (truck-factor), and organizational design in collaborative software development.

Abstract

From science to industry, teamwork plays a crucial role in knowledge production and innovation. Most studies consider teams as static groups of individuals, thereby failing to capture how the micro-dynamics of collaborative processes and organizational changes determine team success. Here, we leverage fine-grained temporal data on software development teams from three software ecosystems -- Rust, JavaScript, and Python -- to gain insights into the dynamics of online collaborative projects. Our analysis reveals an uneven workload distribution in teams, with stronger heterogeneity correlated with higher success, and the early emergence of a lead developer carrying out the majority of work. Moreover, we find that a sizeable fraction of projects experience a change of lead developer, with such a transition being more likely in projects led by inexperienced users. Finally, we show that leadership change is associated with faster success growth. Our work contributes to a deeper understanding of the link between team evolution and success in collaborative processes.

The dynamics of leadership and success in software development teams

TL;DR

Abstract

Paper Structure (18 sections, 2 equations, 25 figures, 2 tables)

This paper contains 18 sections, 2 equations, 25 figures, 2 tables.

Introduction
Results
Emergence of a lead developer
Characterization of lead developers' activity
Lead developers of repositories can change
Repositories that change the lead developer perform better after the change
Discussion
Methods
Data and selection of repositories
Detecting lead developer changes
Matching procedure
Additional characterization of the activity of lead developers
Lead developers have write access
Lead developers are responsible for merging pull requests and branches
Lead developers' position in communication channels
...and 3 more sections

Figures (25)

Figure 1: Workload distribution within teams and relationship with success. (a) Median fraction of commits authored by the $r$-th most active developer of a repository stratified by team size. The most active developer makes more than half of the total number of commits while other developers contribute substantially less, regardless of the size of the team. (b-c) Median number of stars (b) and downloads (c) as a function of the relative effective team size stratified by team size. The more heterogeneous the workload distribution in the team, the higher the success. The Spearman's rank test returns $p < 0.001$ for all team sizes. The number of stars and downloads are incremented by one unit. Error bars range from the 25th to the 75th percentile of the distributions.
Figure 2: Characterization of lead developers' activity. (a) Distribution of the inter-commit times and cumulative distribution of the number of commits close in time (inset) for lead and non-lead developers. Lead developers exhibit higher frequency of commits and longer streaks of consecutive commits. (b) Distribution of the number of repositories in which lead and non-lead developers are active. Lead developers are involved in a larger number of repositories. Box plots indicate median (middle line), 25th, 75th percentile (box) and 5th and 95th percentile (whiskers) as well as outliers (single points). (c) Distribution of the repository switch time of lead and non-lead developers. Lead developers tend to switch from one project to another on a daily to weekly basis. (d) Number of downloads across repositories' lifetime stratified by lead developers' experience (median and its 95% confidence interval). Repositories led by experienced developers are downloaded more compared to those led by inexperienced ones. Time is binned into trimesters.
Figure 3: Lead developers can change across the lifetime of repositories. (a) Cumulative percentage of repositories undergoing a lead developer change as a function of the number of years since their creation. Around 10% of repositories change their lead developer throughout their lifetime, with the majority occurring within the second and third year of activity. (b) Fraction of new commits authored by the old and new lead developer before and after the lead developer transition (mean and its 95% confidence interval). After the transition (vertical dashed line), contributions from the old lead developer diminish rapidly. (c) Percentage of lead-change repositories stratified by the previous experience of the old lead developer. Each point refers to repositories created at a specific year or later. Repositories led by inexperienced lead developers exhibit a significantly higher likelihood to change their lead developer compared to those led by experienced ones, according to Fisher's exact test. Significance levels are denoted as follows: * for $p < 0.05$, ** for $p < 0.01$, and *** for $p < 0.001$. Error bars refer to 95% confidence intervals of the estimated percentages (Wilson score interval).
Figure 4: Lead developer changes are associated with faster success growth. (a) Average effect of lead developer change $\Delta_t$ for stars. Repositories' success grows faster compared to similar repositories that did not undergo such a change. (b-c) Success growth $\Delta_t$ for stars stratified by (b) the success before the change of lead developer and (c) the experience of the old lead developer. Worst-performing repositories exhibit a large positive effect following the change, whereas top-performing ones are minimally affected. Repositories started by an experienced lead developer benefit more from the change than those initiated by an inexperienced one. Error bars in (a) and (c) correspond to 95% confidence intervals of the estimated quantities. In (b) box plots indicate median (middle line), 25th, 75th percentile (box) and 5th and 95th percentile (whiskers) as well as outliers (single points).
Figure S1: Distribution of commit types in the Rust dataset. Commits can implement different types of changes to the software and we can use commit messages as an indicator of the type of change being implemented. After standard text cleaning (removing file paths, URLs, email addresses, and words shorter than three characters, as well as applying stemming), we applied a keyword-based classifier on commit messages inspired by hattori2008nature to label commits as: forward engineering (e.g., "implement", "create"), reengineering (e.g., "optimize", "refactor"), corrective engineering (e.g., "fix", "bug"), and management (e.g., "clean", "documentation"). To account for conventions common in GitHub and Rust, we introduced two additional labels: "bump version" (Rust-specific jargon indicating the release of a new version of a package) and "merger" (merging a pull request or branch). In addition, we label as "unknown" the commits that do not match any of these six classes and "empty" commits whose message is empty. Among commits with an identifiable label, 74% relate to coding tasks (forward, reengineering, corrective) and 26% to management (merger, management, bump version).
...and 20 more figures

The dynamics of leadership and success in software development teams

TL;DR

Abstract

The dynamics of leadership and success in software development teams

Authors

TL;DR

Abstract

Table of Contents

Figures (25)