Table of Contents
Fetching ...

Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX Licenses

Tao Liu, Chengwei Liu, Tianwei Liu, He Wang, Gaofei Wu, Yang Liu, Yuqing Zhang

TL;DR

This work tackles the legal risk of reusing third-party libraries by creating a high-quality, term-level dataset of mainstream SPDX licenses. It standardizes license terms into a 22-term set, labels terms and attitudes across multiple platforms, and expands conflict analysis to include infectious copyleft scenarios, yielding three conflict types (C1–C3). It then conducts two empirical studies: a broad comparison of SPDX licenses and an ecosystem-wide revisit of license usage and conflicts in the NPM registry, revealing substantial term-labeling inconsistencies and persistent conflicts driven by copyleft and obligation terms. The resulting datasets and analyses provide a foundation for developers and researchers to assess license compatibility at scale and guide automated tooling for license compliance. The work highlights the need for careful license management in dependency graphs and offers data-driven insights to reduce legal risk in software supply chains.

Abstract

The widespread adoption of third-party libraries (TPLs) in software development has accelerated the creation of modern software. However, this convenience comes with potential legal risks. Developers may inadvertently violate the licenses of TPLs, leading to legal issues. While existing studies have explored software licenses and potential incompatibilities, these studies often focus on a limited set of licenses or rely on low-quality license data, which may affect their conclusions. To address this gap, there is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses to help developers navigate the complex landscape of software licenses, avoid potential legal pitfalls, and guide solutions for managing license compliance and compatibility in software development. To this end, we conduct the first work to understand the mainstream software licenses based on term granularity and obtain a high-quality dataset of 453 SPDX licenses with well-labeled terms and conflicts. Specifically, we first conduct a differential analysis of the mainstream platforms to understand the terms and attitudes of each license. Next, we propose a standardized set of license terms to capture and label existing mainstream licenses with high quality. Moreover, we include copyleft conflicts and conclude the three major types of license conflicts among the 453 SPDX licenses. Based on these, we carry out two empirical studies to reveal the concerns and threats from the perspectives of both licensors and licensees. One study provides an in-depth analysis of the similarities, differences, and conflicts among SPDX licenses, revisits the usage and conflicts of licenses in the NPM ecosystem, and draws conclusions that differ from previous work. Our studies reveal some insightful findings and disclose relevant analytical data, which set the stage for further research.

Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX Licenses

TL;DR

This work tackles the legal risk of reusing third-party libraries by creating a high-quality, term-level dataset of mainstream SPDX licenses. It standardizes license terms into a 22-term set, labels terms and attitudes across multiple platforms, and expands conflict analysis to include infectious copyleft scenarios, yielding three conflict types (C1–C3). It then conducts two empirical studies: a broad comparison of SPDX licenses and an ecosystem-wide revisit of license usage and conflicts in the NPM registry, revealing substantial term-labeling inconsistencies and persistent conflicts driven by copyleft and obligation terms. The resulting datasets and analyses provide a foundation for developers and researchers to assess license compatibility at scale and guide automated tooling for license compliance. The work highlights the need for careful license management in dependency graphs and offers data-driven insights to reduce legal risk in software supply chains.

Abstract

The widespread adoption of third-party libraries (TPLs) in software development has accelerated the creation of modern software. However, this convenience comes with potential legal risks. Developers may inadvertently violate the licenses of TPLs, leading to legal issues. While existing studies have explored software licenses and potential incompatibilities, these studies often focus on a limited set of licenses or rely on low-quality license data, which may affect their conclusions. To address this gap, there is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses to help developers navigate the complex landscape of software licenses, avoid potential legal pitfalls, and guide solutions for managing license compliance and compatibility in software development. To this end, we conduct the first work to understand the mainstream software licenses based on term granularity and obtain a high-quality dataset of 453 SPDX licenses with well-labeled terms and conflicts. Specifically, we first conduct a differential analysis of the mainstream platforms to understand the terms and attitudes of each license. Next, we propose a standardized set of license terms to capture and label existing mainstream licenses with high quality. Moreover, we include copyleft conflicts and conclude the three major types of license conflicts among the 453 SPDX licenses. Based on these, we carry out two empirical studies to reveal the concerns and threats from the perspectives of both licensors and licensees. One study provides an in-depth analysis of the similarities, differences, and conflicts among SPDX licenses, revisits the usage and conflicts of licenses in the NPM ecosystem, and draws conclusions that differ from previous work. Our studies reveal some insightful findings and disclose relevant analytical data, which set the stage for further research.
Paper Structure (23 sections, 7 figures, 6 tables)

This paper contains 23 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Examples of conflicts between licenses.
  • Figure 2: The overview of data processing on term extraction.
  • Figure 3: The distribution of rights and obligations in SPDX licenses.
  • Figure 4: Conflicted attitude by Xu et al. xu2021lidetector
  • Figure 4: The frequent patterns with different thresholds in license terms.
  • ...and 2 more figures