Risks and Opportunities of Open-Source Generative AI

Francisco Eiras; Aleksandar Petrov; Bertie Vidgen; Christian Schroeder; Fabio Pizzati; Katherine Elkins; Supratik Mukhopadhyay; Adel Bibi; Aaron Purewal; Csaba Botos; Fabro Steibel; Fazel Keshtkar; Fazl Barez; Genevieve Smith; Gianluca Guadagni; Jon Chun; Jordi Cabot; Joseph Imperial; Juan Arturo Nolazco; Lori Landay; Matthew Jackson; Phillip H. S. Torr; Trevor Darrell; Yong Lee; Jakob Foerster

Risks and Opportunities of Open-Source Generative AI

Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster

TL;DR

This paper assesses the risks and opportunities of open-source generative AI through a three-stage development framework (near, mid, long-term) and a detailed openness taxonomy. It argues that open sourcing offers net benefits across research, safety, equity, and social impact, while acknowledging certain risks that require system-level safeguards and policy guidance. The authors map regulatory landscapes (EU AI Act, Biden EO, Chinese measures, Middle East initiatives) and analyze current LLM openness, highlighting that weights/data remain largely closed and performance gaps exist. They then present practical recommendations for policy, best practices, and risk mitigation, underscoring that openness—coupled with responsible governance—can enhance transparency, inclusivity, and safety while enabling decentralized innovation and coordination. Overall, the work advocates for permissive open-source practices complemented by voluntary, proactive safety and transparency measures to maximize societal benefits of Gen AI.

Abstract

Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.

Risks and Opportunities of Open-Source Generative AI

TL;DR

Abstract

Paper Structure (48 sections, 5 figures, 4 tables)

This paper contains 48 sections, 5 figures, 4 tables.

Introduction
Preliminaries
Stages of Development of Generative AI Models
Training, Evaluating, and Deploying Large Language Models
Open-source Gen AI Governance
The EU AI Act
Biden's Executive Order
China's Gen AI Legislation
The Middle East
AI Regulation Efforts in Other Countries
Openness Taxonomy of Generative AI Models
Classifying Openness for Generative AI Code and Data
Openness Taxonomy of Current Large Language Models
Near to Mid-Term Impacts of Open-Source Generative AI
Contrastive Socio-Technical Analysis of Risks and Benefits
...and 33 more sections

Figures (5)

Figure 1: Three Development Stages for Generative AI Models: near-term is defined by early use and exploration of the technology in much of its current state; mid-term is a result of the widespread adoption of the technology and further scaling at current pace; long-term is the result of technological advances that enable greater AI capabilities.
Figure 2: Model Pipeline: pipeline of model (1) training, (2) evaluation and (3) deployment analyzed in the report. The component Common Benchmarks Evaluation (in light gray) is included in the pipeline for completeness yet will not be analyzed in detail as these are commonly available and transversal to a substantial number of models.
Figure 3: Openness Scale: categorization of the levels of openness of the code and data of each model component. See Table \ref{['tab:licenses']} for a reference on the restrictions imposed by each license.
Figure 4: Taxonomy Analysis: (a) shows the openness level distribution for each of the pipeline components of the 45 LLMs studied. Color legend: C1/D1, C2/D2, C3/D3, C4/D4, C5/D5, ? (unknown or not publicly available), N/A (not applicable). For conciseness, we use "FT" as a stand in for "Fine-Tuning". (b) plots the percentage of closed components for the studied models and their Chatbot Arena ELO Score.
Figure 6: Near to Mid-term Impacts of Open-Source Models: specific impacts of open-source Gen AI models categorized by area of impact and whether they are positive (+) or negative (-).

Risks and Opportunities of Open-Source Generative AI

TL;DR

Abstract

Risks and Opportunities of Open-Source Generative AI

Authors

TL;DR

Abstract

Table of Contents

Figures (5)