Table of Contents
Fetching ...

Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?

Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, Ben Y. Zhao

TL;DR

This study tackles the challenge of distinguishing human art from AI generated images in a landscape of rapid generative advances. It combines a large multimodal dataset across seven art styles, five generators, and multiple perturbations with five automated detectors and three human populations to comprehensively evaluate detection performance. The findings show that commercial detectors like Hive perform exceptionally well on unperturbed images but are vulnerable to perturbations such as Glaze; expert human artists can outperform machines in glazed scenarios, though humans are inconsistent in some cases. The paper concludes that a combined team of human and automated detectors offers the best balance of accuracy and robustness, highlighting the need for perturbation-aware training and mixed detection strategies in practice.

Abstract

The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.

Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?

TL;DR

This study tackles the challenge of distinguishing human art from AI generated images in a landscape of rapid generative advances. It combines a large multimodal dataset across seven art styles, five generators, and multiple perturbations with five automated detectors and three human populations to comprehensively evaluate detection performance. The findings show that commercial detectors like Hive perform exceptionally well on unperturbed images but are vulnerable to perturbations such as Glaze; expert human artists can outperform machines in glazed scenarios, though humans are inconsistent in some cases. The paper concludes that a combined team of human and automated detectors offers the best balance of accuracy and robustness, highlighting the need for perturbation-aware training and mixed detection strategies in practice.

Abstract

The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.
Paper Structure (35 sections, 10 figures, 14 tables)

This paper contains 35 sections, 10 figures, 14 tables.

Figures (10)

  • Figure 1: Samples of human art and matching images produced by generative AI models. Copyright held by respective artists, © Kirsty (@kirue_t), © Nguyen Viet, © Liam Collod
  • Figure 2: The confidence score produced by automated detectors on images generated by 5 generators. Detecting images generated by Firefly is the hardest.
  • Figure 3: Impact of five different perturbations on the Hive confidence score, for 350 AI-generated images. In each figure, the images are indexed by the increasing Hive score of unperturbed versions.
  • Figure 4: CDF of overlay intensity required to change Hive's decision over the period of 3 months.
  • Figure 5: CDF of overlay intensity required to change Hive's decision using different overlay methods in June 2024.
  • ...and 5 more figures