Table of Contents
Fetching ...

Can Copyright be Reduced to Privacy?

Niva Elkin-Koren, Uri Hacohen, Roi Livni, Shay Moran

TL;DR

The paper investigates whether algorithmic-stability notions such as differential privacy and near-access freeness can serve as reliable proxies for copyright protection in generative AI training. It highlights fundamental disanalogies between privacy aims and copyright law, including the idea–expression dichotomy, duration, and fair-use allowances, arguing that DP/NAF cannot implement a definitive infringement test. The authors show that these stability notions are typically over-inclusive or under-inclusive with respect to copyright, potentially stifling lawful creativity or failing to prevent infringement. They propose reframing the role of algorithmic methods toward measuring originality via semantic distance and using flexible safety-function frameworks to better align technical tools with copyright principles and legal decision-making.

Abstract

There is a growing concern that generative AI models will generate outputs closely resembling the copyrighted materials for which they are trained. This worry has intensified as the quality and complexity of generative models have immensely improved, and the availability of extensive datasets containing copyrighted material has expanded. Researchers are actively exploring strategies to mitigate the risk of generating infringing samples, with a recent line of work suggesting to employ techniques such as differential privacy and other forms of algorithmic stability to provide guarantees on the lack of infringing copying. In this work, we examine whether such algorithmic stability techniques are suitable to ensure the responsible use of generative models without inadvertently violating copyright laws. We argue that while these techniques aim to verify the presence of identifiable information in datasets, thus being privacy-oriented, copyright law aims to promote the use of original works for the benefit of society as a whole, provided that no unlicensed use of protected expression occurred. These fundamental differences between privacy and copyright must not be overlooked. In particular, we demonstrate that while algorithmic stability may be perceived as a practical tool to detect copying, such copying does not necessarily constitute copyright infringement. Therefore, if adopted as a standard for detecting an establishing copyright infringement, algorithmic stability may undermine the intended objectives of copyright law.

Can Copyright be Reduced to Privacy?

TL;DR

The paper investigates whether algorithmic-stability notions such as differential privacy and near-access freeness can serve as reliable proxies for copyright protection in generative AI training. It highlights fundamental disanalogies between privacy aims and copyright law, including the idea–expression dichotomy, duration, and fair-use allowances, arguing that DP/NAF cannot implement a definitive infringement test. The authors show that these stability notions are typically over-inclusive or under-inclusive with respect to copyright, potentially stifling lawful creativity or failing to prevent infringement. They propose reframing the role of algorithmic methods toward measuring originality via semantic distance and using flexible safety-function frameworks to better align technical tools with copyright principles and legal decision-making.

Abstract

There is a growing concern that generative AI models will generate outputs closely resembling the copyrighted materials for which they are trained. This worry has intensified as the quality and complexity of generative models have immensely improved, and the availability of extensive datasets containing copyrighted material has expanded. Researchers are actively exploring strategies to mitigate the risk of generating infringing samples, with a recent line of work suggesting to employ techniques such as differential privacy and other forms of algorithmic stability to provide guarantees on the lack of infringing copying. In this work, we examine whether such algorithmic stability techniques are suitable to ensure the responsible use of generative models without inadvertently violating copyright laws. We argue that while these techniques aim to verify the presence of identifiable information in datasets, thus being privacy-oriented, copyright law aims to promote the use of original works for the benefit of society as a whole, provided that no unlicensed use of protected expression occurred. These fundamental differences between privacy and copyright must not be overlooked. In particular, we demonstrate that while algorithmic stability may be perceived as a practical tool to detect copying, such copying does not necessarily constitute copyright infringement. Therefore, if adopted as a standard for detecting an establishing copyright infringement, algorithmic stability may undermine the intended objectives of copyright law.
Paper Structure (19 sections, 4 theorems, 16 equations, 1 figure)

This paper contains 19 sections, 4 theorems, 16 equations, 1 figure.

Key Result

Proposition 1

Let $A$ be an algorithm mapping samples $S$ to models $q^A_S$ such that $\mathop{\mathbb{E}}_{S_1,S_2}\left[\| q^A_{S_1}- q^A_{S_2}\|\right] \le \alpha,$ where $S_1,S_2\sim D^m$ are two independent samples. Then, there exist an $(\epsilon,\delta)$ DP algorithm $B$ that receives a sample $S_B\sim D^{

Figures (1)

  • Figure 1:

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • lemma 1: A special case of Thm 2 in angel2019pairwise
  • lemma 2: korolova2009releasingbun2016simultaneous