Can Copyright be Reduced to Privacy?

Niva Elkin-Koren; Uri Hacohen; Roi Livni; Shay Moran

Can Copyright be Reduced to Privacy?

Niva Elkin-Koren, Uri Hacohen, Roi Livni, Shay Moran

TL;DR

The paper investigates whether algorithmic-stability notions such as differential privacy and near-access freeness can serve as reliable proxies for copyright protection in generative AI training. It highlights fundamental disanalogies between privacy aims and copyright law, including the idea–expression dichotomy, duration, and fair-use allowances, arguing that DP/NAF cannot implement a definitive infringement test. The authors show that these stability notions are typically over-inclusive or under-inclusive with respect to copyright, potentially stifling lawful creativity or failing to prevent infringement. They propose reframing the role of algorithmic methods toward measuring originality via semantic distance and using flexible safety-function frameworks to better align technical tools with copyright principles and legal decision-making.

Abstract

There is a growing concern that generative AI models will generate outputs closely resembling the copyrighted materials for which they are trained. This worry has intensified as the quality and complexity of generative models have immensely improved, and the availability of extensive datasets containing copyrighted material has expanded. Researchers are actively exploring strategies to mitigate the risk of generating infringing samples, with a recent line of work suggesting to employ techniques such as differential privacy and other forms of algorithmic stability to provide guarantees on the lack of infringing copying. In this work, we examine whether such algorithmic stability techniques are suitable to ensure the responsible use of generative models without inadvertently violating copyright laws. We argue that while these techniques aim to verify the presence of identifiable information in datasets, thus being privacy-oriented, copyright law aims to promote the use of original works for the benefit of society as a whole, provided that no unlicensed use of protected expression occurred. These fundamental differences between privacy and copyright must not be overlooked. In particular, we demonstrate that while algorithmic stability may be perceived as a practical tool to detect copying, such copying does not necessarily constitute copyright infringement. Therefore, if adopted as a standard for detecting an establishing copyright infringement, algorithmic stability may undermine the intended objectives of copyright law.

Can Copyright be Reduced to Privacy?

TL;DR

Abstract

Paper Structure (19 sections, 4 theorems, 16 equations, 1 figure)

This paper contains 19 sections, 4 theorems, 16 equations, 1 figure.

Introduction
Related Work
Algorithmic stability as a surrogate for copyright
Differential Privacy
Near Access Freeness
Model safety vs. Content safety
Safety functions
The gap between algorithmic stability and copyright
Over-inclusiveness
When input content is in the public domain
When an input content incorporates unprotected aspects
When the use of the protected aspects of the input content was lawful
Under-Inclusiveness
Discussion
Stability is not safe
...and 4 more sections

Key Result

Proposition 1

Let $A$ be an algorithm mapping samples $S$ to models $q^A_S$ such that $\mathop{\mathbb{E}}_{S_1,S_2}\left[\| q^A_{S_1}- q^A_{S_2}\|\right] \le \alpha,$ where $S_1,S_2\sim D^m$ are two independent samples. Then, there exist an $(\epsilon,\delta)$ DP algorithm $B$ that receives a sample $S_B\sim D^{

Figures (1)

Figure 1:

Theorems & Definitions (4)

Proposition 1
Proposition 2
lemma 1: A special case of Thm 2 in angel2019pairwise
lemma 2: korolova2009releasingbun2016simultaneous

Can Copyright be Reduced to Privacy?

TL;DR

Abstract

Can Copyright be Reduced to Privacy?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (4)