When AI Eats Itself: On the Caveats of AI Autophagy

Xiaodan Xing; Fadong Shi; Jiahao Huang; Yinzhe Wu; Yang Nan; Sheng Zhang; Yingying Fang; Mike Roberts; Carola-Bibiane Schönlieb; Javier Del Ser; Guang Yang

When AI Eats Itself: On the Caveats of AI Autophagy

Xiaodan Xing, Fadong Shi, Jiahao Huang, Yinzhe Wu, Yang Nan, Sheng Zhang, Yingying Fang, Mike Roberts, Carola-Bibiane Schönlieb, Javier Del Ser, Guang Yang

TL;DR

This study examines the existing literature, delving into the consequences of AI autophagy, analyzing the associated risks, and exploring strategies to mitigate its impact, to provide a comprehensive perspective on this phenomenon.

Abstract

Generative Artificial Intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimise training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimise outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data. This trend, known as the AI autophagy phenomenon, suggests a future where generative AI systems may increasingly consume their own outputs without discernment, raising concerns about model performance, reliability, and ethical implications. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? To address these research questions, this study examines the existing literature, delving into the consequences of AI autophagy, analyzing the associated risks, and exploring strategies to mitigate its impact. Our aim is to provide a comprehensive perspective on this phenomenon advocating for a balanced approach that promotes the sustainable development of generative AI technologies in the era of large models.

When AI Eats Itself: On the Caveats of AI Autophagy

TL;DR

Abstract

Paper Structure (17 sections, 6 figures)

This paper contains 17 sections, 6 figures.

Introduction
RQ1: What Happens When AI Eats Itself?
Fully Synthetic Loop: Worst Case and Theoretical Model
Fixed-Real Data Loop: Stability and Limitations
Fresh Data Loop: A Temporary Solution or a Real Fix?
The Diminished Role of Synthetic Data in Data Augmentation
RQ2: What Technical Strategies Can Be Employed to Mitigate the Negative Consequences of AI Autophagy?
The Limitations of Cherry-Picking in Mitigating AI Autophagy
Advantages and Caveats of Synthetic Content Watermarking
Advantages and Caveats of Synthetic Content Detection
RQ3: Which Regulatory Strategies Can Be Employed to Address These Negative Consequences?
Problematic Data Acquisition and Dissemination Strategies
Relevant Regulations for the Dissemination Process
Conclusions and Outlook
Ethical and Societal Considerations
...and 2 more sections

Figures (6)

Figure 1: A diagram showing (a) the scope of this work and (b) the major research questions. This study centers on the AI autophagy phenomenon, with a focus on analyzing its effects and discussing solutions to mitigate potential negative impacts.
Figure 2: The trend of mean and variance estimations from the autophagy loop. Here we chose the starting value $\mu=0$ and $\sigma=1$. For each round, $10^4$ data points were generated to perform the estimation.
Figure 3: Synthetic images generated from an autophagy loop example using a DDPM model on the MNIST dataset deng2012mnist illustrate quality degradation (a). In (b), applying a selective process that retains only high-quality images for subsequent autophagy loops reduces diversity. (c) presents results from a more complex task using the StyleGAN model karras2020analyzing on the AFHQ cat dataset choi2020stargan, with a truncation parameter of 0.7 to balance diversity and quality. Despite these adjustments, quality degradation and diversity loss remain evident during autophagy
Figure 4: Synthetic texts from an autophagy loop from the study in RN4, demonstrating a reduction in text variety. In later generations, the texts increasingly exhibit duplication.
Figure 5: Different loops in AI autophagy.
...and 1 more figures

When AI Eats Itself: On the Caveats of AI Autophagy

TL;DR

Abstract

When AI Eats Itself: On the Caveats of AI Autophagy

Authors

TL;DR

Abstract

Table of Contents

Figures (6)