Table of Contents
Fetching ...

MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions

Shu Yang, Muhammad Asif Ali, Lu Yu, Lijie Hu, Di Wang

TL;DR

MONAL analyzes how large models and human agents exchange and transform information through two autophagous loops, revealing a trend toward synthetic data dominating training signals. The framework combines theoretical notations with three empirical protocols—cross-scoring, exam scenario, and AI-washing—to quantify biases, data quality, and diversity loss across text and image modalities. Key findings show models overvaluing their own outputs, human data being deprioritized, and diversity eroding as synthetic data cycles intensify, potentially leading to a local optimum. The work highlights social and practical implications for AI-enabled information ecosystems and calls for strategies to preserve genuine human-generated data to sustain model performance and societal trust.

Abstract

The increasing significance of large models and their multi-modal variants in societal information processing has ignited debates on social safety and ethics. However, there exists a paucity of comprehensive analysis for: (i) the interactions between human and artificial intelligence systems, and (ii) understanding and addressing the associated limitations. To bridge this gap, we propose Model Autophagy Analysis (MONAL) for large models' self-consumption explanation. MONAL employs two distinct autophagous loops (referred to as ``self-consumption loops'') to elucidate the suppression of human-generated information in the exchange between human and AI systems. Through comprehensive experiments on diverse datasets, we evaluate the capacities of generated models as both creators and disseminators of information. Our key findings reveal (i) A progressive prevalence of model-generated synthetic information over time within training datasets compared to human-generated information; (ii) The discernible tendency of large models, when acting as information transmitters across multiple iterations, to selectively modify or prioritize specific contents; and (iii) The potential for a reduction in the diversity of socially or human-generated information, leading to bottlenecks in the performance enhancement of large models and confining them to local optima.

MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions

TL;DR

MONAL analyzes how large models and human agents exchange and transform information through two autophagous loops, revealing a trend toward synthetic data dominating training signals. The framework combines theoretical notations with three empirical protocols—cross-scoring, exam scenario, and AI-washing—to quantify biases, data quality, and diversity loss across text and image modalities. Key findings show models overvaluing their own outputs, human data being deprioritized, and diversity eroding as synthetic data cycles intensify, potentially leading to a local optimum. The work highlights social and practical implications for AI-enabled information ecosystems and calls for strategies to preserve genuine human-generated data to sustain model performance and societal trust.

Abstract

The increasing significance of large models and their multi-modal variants in societal information processing has ignited debates on social safety and ethics. However, there exists a paucity of comprehensive analysis for: (i) the interactions between human and artificial intelligence systems, and (ii) understanding and addressing the associated limitations. To bridge this gap, we propose Model Autophagy Analysis (MONAL) for large models' self-consumption explanation. MONAL employs two distinct autophagous loops (referred to as ``self-consumption loops'') to elucidate the suppression of human-generated information in the exchange between human and AI systems. Through comprehensive experiments on diverse datasets, we evaluate the capacities of generated models as both creators and disseminators of information. Our key findings reveal (i) A progressive prevalence of model-generated synthetic information over time within training datasets compared to human-generated information; (ii) The discernible tendency of large models, when acting as information transmitters across multiple iterations, to selectively modify or prioritize specific contents; and (iii) The potential for a reduction in the diversity of socially or human-generated information, leading to bottlenecks in the performance enhancement of large models and confining them to local optima.
Paper Structure (29 sections, 4 equations, 9 figures, 4 tables)

This paper contains 29 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Self-consumption loop of large models. This figure is based on recent workflows for automated data generation and filtering wang2023selfinstructli2023selfalignment. We emphasize the preferential nature of large models as generators and filters of synthetic data.
  • Figure 2: Self-consumption loop emphasizes the role of humans as filters and transmitters of information veselovsky2023artificial while interacting with large models. Such a role primarily exists during the process of information dissemination in human society.
  • Figure 3: An example illustration of AI-washing on images that shows that repeatedly processing images $N$-times ($N$=1:5) using SDXL model podell2023sdxl may lead to serious biases.
  • Figure 4: After 20 rounds of AI-washing experiments with the SDXL model podell2023sdxl, it becomes evident that different images retain and discard details in varying manners.
  • Figure 5: Density distributions of cosine similarity scores for text samples from Book3 processed $N$ times by ChatGPT.
  • ...and 4 more figures