Patterns in the Transition From Founder-Leadership to Community Governance of Open Source
Mobina Noori, Mahasweta Chakraborti, Amy X Zhang, Seth Frey
TL;DR
Open source governance transitions from founder-led to community-led models are increasingly common, yet poorly understood at scale. The authors develop a scalable pipeline that uses GOVERNANCE.md snapshots from $637$ paired OSS projects (initial vs latest) and extract institutional elements—Roles, Actions, and Deontics—via an NLP-assisted Institutional Grammar framework, embedding with $\text{Sentence-BERT}$ and clustering with BERTopic to obtain $H$ (entropy) and $K$ (distinct construct counts), plus $\Delta H$, $\Delta K$, and Jensen–Shannon divergence as a drift measure. Their results show broad maturation: counts of roles and actions grow, distributions become more balanced ($\Delta H>0$ for roles and actions), and deontics remain predominantly enabling; distributional drift confirms structural evolution, and robustness checks via rarefaction support that these changes are not artifacts of longer later documents. The study contributes a dataset of paired governance artifacts, a scalable NLP-IG pipeline, and quantitative metrics that reveal how governance institutions layer and coordinate as OSS projects scale, with ecosystem oversight increasingly formalized. Practically, the work provides a method to monitor governance evolution, informing tooling, policy, and platform design to support accountable, community-driven infrastructure.
Abstract
Open digital public infrastructure needs community management to ensure accountability, sustainability, and robustness. Yet open-source projects often rely on centralized decision-making, and the determinants of successful community management remain unclear. We analyze 637 GitHub repositories to trace transitions from founder-led to shared governance. Specifically, we document trajectories to community governance by extracting institutional roles, actions, and deontic cues from version-controlled project constitutions GOVERNANCE .md. With a semantic parsing pipeline, we cluster elements into broader role and action types. We find roles and actions grow, and regulation becomes more balanced, reflecting increases in governance scope and differentiation over time. Rather than shifting tone, communities grow by layering and refining responsibilities. As transitions to community management mature, projects increasingly regulate ecosystem-level relationships and add definition to project oversight roles. Overall, this work offers a scalable pipeline for tracking the growth and development of community governance regimes from open-source software's familiar default of founder-ownership.
