Table of Contents
Fetching ...

Patterns in the Transition From Founder-Leadership to Community Governance of Open Source

Mobina Noori, Mahasweta Chakraborti, Amy X Zhang, Seth Frey

TL;DR

Open source governance transitions from founder-led to community-led models are increasingly common, yet poorly understood at scale. The authors develop a scalable pipeline that uses GOVERNANCE.md snapshots from $637$ paired OSS projects (initial vs latest) and extract institutional elements—Roles, Actions, and Deontics—via an NLP-assisted Institutional Grammar framework, embedding with $\text{Sentence-BERT}$ and clustering with BERTopic to obtain $H$ (entropy) and $K$ (distinct construct counts), plus $\Delta H$, $\Delta K$, and Jensen–Shannon divergence as a drift measure. Their results show broad maturation: counts of roles and actions grow, distributions become more balanced ($\Delta H>0$ for roles and actions), and deontics remain predominantly enabling; distributional drift confirms structural evolution, and robustness checks via rarefaction support that these changes are not artifacts of longer later documents. The study contributes a dataset of paired governance artifacts, a scalable NLP-IG pipeline, and quantitative metrics that reveal how governance institutions layer and coordinate as OSS projects scale, with ecosystem oversight increasingly formalized. Practically, the work provides a method to monitor governance evolution, informing tooling, policy, and platform design to support accountable, community-driven infrastructure.

Abstract

Open digital public infrastructure needs community management to ensure accountability, sustainability, and robustness. Yet open-source projects often rely on centralized decision-making, and the determinants of successful community management remain unclear. We analyze 637 GitHub repositories to trace transitions from founder-led to shared governance. Specifically, we document trajectories to community governance by extracting institutional roles, actions, and deontic cues from version-controlled project constitutions GOVERNANCE .md. With a semantic parsing pipeline, we cluster elements into broader role and action types. We find roles and actions grow, and regulation becomes more balanced, reflecting increases in governance scope and differentiation over time. Rather than shifting tone, communities grow by layering and refining responsibilities. As transitions to community management mature, projects increasingly regulate ecosystem-level relationships and add definition to project oversight roles. Overall, this work offers a scalable pipeline for tracking the growth and development of community governance regimes from open-source software's familiar default of founder-ownership.

Patterns in the Transition From Founder-Leadership to Community Governance of Open Source

TL;DR

Open source governance transitions from founder-led to community-led models are increasingly common, yet poorly understood at scale. The authors develop a scalable pipeline that uses GOVERNANCE.md snapshots from paired OSS projects (initial vs latest) and extract institutional elements—Roles, Actions, and Deontics—via an NLP-assisted Institutional Grammar framework, embedding with and clustering with BERTopic to obtain (entropy) and (distinct construct counts), plus , , and Jensen–Shannon divergence as a drift measure. Their results show broad maturation: counts of roles and actions grow, distributions become more balanced ( for roles and actions), and deontics remain predominantly enabling; distributional drift confirms structural evolution, and robustness checks via rarefaction support that these changes are not artifacts of longer later documents. The study contributes a dataset of paired governance artifacts, a scalable NLP-IG pipeline, and quantitative metrics that reveal how governance institutions layer and coordinate as OSS projects scale, with ecosystem oversight increasingly formalized. Practically, the work provides a method to monitor governance evolution, informing tooling, policy, and platform design to support accountable, community-driven infrastructure.

Abstract

Open digital public infrastructure needs community management to ensure accountability, sustainability, and robustness. Yet open-source projects often rely on centralized decision-making, and the determinants of successful community management remain unclear. We analyze 637 GitHub repositories to trace transitions from founder-led to shared governance. Specifically, we document trajectories to community governance by extracting institutional roles, actions, and deontic cues from version-controlled project constitutions GOVERNANCE .md. With a semantic parsing pipeline, we cluster elements into broader role and action types. We find roles and actions grow, and regulation becomes more balanced, reflecting increases in governance scope and differentiation over time. Rather than shifting tone, communities grow by layering and refining responsibilities. As transitions to community management mature, projects increasingly regulate ecosystem-level relationships and add definition to project oversight roles. Overall, this work offers a scalable pipeline for tracking the growth and development of community governance regimes from open-source software's familiar default of founder-ownership.

Paper Structure

This paper contains 17 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Processing pipeline from raw governance files to structured institutional statements and analysis. These steps support measurement of change in count and concentration from initial to latest versions of version-controlled GOVERNANCE.md project constitutions. This diagram shows how governance text is normalized, parsed into roles, actions, and deontics, and clustered into institutional constructs.
  • Figure 2: Projects diversify the range of governance roles over time. Plots show the share of role mentions in initial versus latest governance snapshots. Early constitutions are dominated by broad categories such as "all_project" and "all_community," while later constitutions redistribute attention across more specialized roles (e.g., subcommittees, technical committees, and steering groups). This broadening reflects institutional development toward greater specialization and shared governance. The differences in the distributions of these types are small but significant.
  • Figure 3: Projects expand the catalog of governance actions over time. Plot shows the share of action mentions in initial versus latest governance snapshots. While high-level categories such as "choice" and "authority" remain prominent, later constitutions show a broader and more balanced distribution across action types, reflecting increased institutional complexity and scope. As above, the differences in the distributions of these types are small but significant.
  • Figure 4: Deontic composition in OSS governance remains broadly stable over time, although entropy of binomial enabling/restricting distribution increases. Panel A show the share of modal expressions (can/may," must/will," should") in initial versus latest governance snapshots. The relative balance between permissive, obligatory, and advisory language changes little, suggesting that while projects diversify roles and actions, the prescriptive force of their rules remains largely constant. Panel B highlights the distribution of enabling versus restricting deontic statements. Enabling language ("can", "may") accounts for over 97% of references in both periods, while restricting terms ("cannot", "must not") remain a small minority, and seem to decline further over time. Together, these patterns underscore the relative stability of governance constitutions, where permissions dominate over prohibitions.
  • Figure 5: The number of types of roles and actions invoked in constitutions increases significantly. Violin plots compare the number of categories of (a) Roles, (b) Actions, and (c) Deontics between initial and latest snapshots for each repository. Roles and actions both show significant increases.