U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

Tanja Šarčević; Alicja Karlowicz; Rudolf Mayer; Ricardo Baeza-Yates; Andreas Rauber

U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

Tanja Šarčević, Alicja Karlowicz, Rudolf Mayer, Ricardo Baeza-Yates, Andreas Rauber

TL;DR

A taxonomy is proposed that leads to a systematic review of technical solutions for safeguarding the data from intellectual property violations in GAI and specifically focuses on the properties of generative models that enable misuse leading to potential IP violations.

Abstract

Large Generative AI (GAI) models have the unparalleled ability to generate text, images, audio, and other forms of media that are increasingly indistinguishable from human-generated content. As these models often train on publicly available data, including copyrighted materials, art and other creative works, they inadvertently risk violating copyright and misappropriation of intellectual property (IP). Due to the rapid development of generative AI technology and pressing ethical considerations from stakeholders, protective mechanisms and techniques are emerging at a high pace but lack systematisation. In this paper, we study the concerns regarding the intellectual property rights of training data and specifically focus on the properties of generative models that enable misuse leading to potential IP violations. Then we propose a taxonomy that leads to a systematic review of technical solutions for safeguarding the data from intellectual property violations in GAI.

U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

TL;DR

Abstract

Paper Structure (43 sections, 11 figures, 2 tables)

This paper contains 43 sections, 11 figures, 2 tables.

Introduction
Related work
Methodology
Background
Variational Autoencoders
Generative Adversarial Networks
Diffusion Models
Large Language Models
Threats to the IP of training data in GAI
Potential IP violations
Unauthorised data usage
Unauthorised training
Unauthorised editing
Plagiarism and imitation
Style mimicry in visual art
...and 28 more sections

Figures (11)

Figure 1: U can't gen this? MC Hammer failing to generate a photo of himself" .
Figure 2: The literature distribution over types of protection methods and year of publication.
Figure 3: Style mimicry in Stable Diffusion. Left: original artwork by Hollie Mengert vs. Right: images generated in her style baio_invasive_2022.
Figure 4: Data replication in Stable Diffusion. Top row: generated images, bottom row: training samples somepalli_understanding_2023.
Figure 5: Taxonomy of the IP protection methods for training data in GAI.
...and 6 more figures

U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

TL;DR

Abstract

U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

Authors

TL;DR

Abstract

Table of Contents

Figures (11)