"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Moming Duan, Rui Zhao, Linshan Jiang, Nigel Shadbolt, Bingsheng He
TL;DR
The paper tackles licensing in ML where traditional OSS and free-content licenses fail to capture ML-specific workflows and assets, risking noncompliance during model reuse and publication. It introduces MG Analyzer, an RDF/Notational3-based tool with an MG Vocabulary to model ML workflows and reason about license rights and constraints, and MG Licenses, a set of modellike licenses designed to standardize model publishing with configurable options. Through rule-encoding generalizations and reasoning over ML dependencies, the approach demonstrates improved clarity and flexibility over existing licenses, and it highlights the prevalence of non-standard licensing in model publishing. The work aims to move ML license management toward Linked Open Model Production Data and provides a foundation for more transparent, compliant, and interoperable model ecosystems.
Abstract
As model parameter sizes scale into the billions and training consumes zettaFLOPs of computation, the reuse of Machine Learning (ML) assets and collaborative development have become increasingly prevalent in the ML community. These ML assets, including models, datasets, and software, may originate from various sources and be published under different licenses, which govern the use and distribution of licensed works and their derivatives. However, commonly chosen licenses, such as GPL and Apache, are software-specific and are not clearly defined or bounded in the context of model publishing. Meanwhile, the reused assets may also be under free-content licenses and model licenses, which pose a potential risk of license noncompliance and rights infringement within the model production workflow. In this paper, we address these challenges along two lines: 1) For ML workflow compliance, we propose ModelGo (MG) Analyzer, a tool that incorporates a vocabulary for ML workflow management and encoded license rules, enabling ontological reasoning to analyze rights granting and compliance issues. 2) For standardized model publishing, we introduce ModelGo Licenses, a set of modell-specific licenses that provide flexible options to meet the diverse needs of the ML community. MG Analyzer is built on Turtle language and Notation3 reasoning engine, envisioned as a first step toward Linked Open Data for ML workflow management. We have also encoded our proposed model licenses into rules and demonstrated the effects of GPL and other commonly used licenses in model publishing, along with the flexibility advantages of our licenses, through comparisons and experiments.
