Table of Contents
Fetching ...

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

Johan Linåker, Cailean Osborne, Jennifer Ding, Ben Burtenshaw

TL;DR

The paper maps how open collaboration unfolds across the lifecycle of open LLMs, from pre-training to post-release reuse, revealing that collaboration spans datasets, benchmarks, frameworks, leaderboards, and compute partnerships beyond the models themselves. Through semi-structured interviews with 17 developers across 14 projects, it identifies three core contributions: a broad view of artifacts enabling collaboration, a diversity of social, economic, and technological motivations, and five governance models that shape participation and impact. The findings show lifecycle-specific collaboration patterns, with cathedral-like pre-training, targeted post-training, and platform-driven post-release reuse, underscoring the need for ecosystem-oriented governance and policy support. Practically, the work offers actionable recommendations for researchers, companies, policymakers, platform providers, and foundations to foster a more inclusive, efficient open AI ecosystem with stronger data provenance, standardized tooling, and public infrastructure. Overall, the study provides a nuanced framework to understand and nurture open collaboration in AI, with implications for openness definitions, governance, and community engagement across disciplines and regions.

Abstract

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

TL;DR

The paper maps how open collaboration unfolds across the lifecycle of open LLMs, from pre-training to post-release reuse, revealing that collaboration spans datasets, benchmarks, frameworks, leaderboards, and compute partnerships beyond the models themselves. Through semi-structured interviews with 17 developers across 14 projects, it identifies three core contributions: a broad view of artifacts enabling collaboration, a diversity of social, economic, and technological motivations, and five governance models that shape participation and impact. The findings show lifecycle-specific collaboration patterns, with cathedral-like pre-training, targeted post-training, and platform-driven post-release reuse, underscoring the need for ecosystem-oriented governance and policy support. Practically, the work offers actionable recommendations for researchers, companies, policymakers, platform providers, and foundations to foster a more inclusive, efficient open AI ecosystem with stronger data provenance, standardized tooling, and public infrastructure. Overall, the study provides a nuanced framework to understand and nurture open collaboration in AI, with implications for openness definitions, governance, and community engagement across disciplines and regions.

Abstract

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.

Paper Structure

This paper contains 104 sections, 8 figures.

Figures (8)

  • Figure 1: Model Collaboration On-ramps and Challenges across the Open LLM Lifecycle
  • Figure 2: Data Collaboration On-ramps and Challenges across the Open LLM Lifecycle
  • Figure 3: Software Collaboration On-ramps and Challenges across the Open LLM Lifecycle
  • Figure 4: Evaluation Collaboration On-ramps and Challenges across the Open LLM Lifecycle
  • Figure 5: Non-technical Collaboration On-ramps and Challenges across the Open LLM Lifecycle
  • ...and 3 more figures