Table of Contents
Fetching ...

Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images

Negar Kamali, Karyn Nakamura, Aakriti Kumar, Angelos Chatzimparmpas, Jessica Hullman, Matthew Groh

TL;DR

The paper tackles the challenge of photorealism in diffusion-model images by developing a five-category artifact taxonomy and assessing human ability to distinguish AI-generated from real photographs through a large-scale crowdsourced experiment. It demonstrates that scene complexity, artifact type, display duration, and human curation significantly shape detection accuracy, with curated prompts often yielding more photorealistic but harder-to-detect images. The study provides quantitative benchmarks (e.g., 76% AI-image detection vs 74% real-image detection) and qualitative insights from participant comments, and releases replication data to foster reproducibility. These findings inform AI-literacy tools and highlight the limits of both automated detectors and diffusion models in producing consistently photorealistic images in 2024.

Abstract

Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.

Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images

TL;DR

The paper tackles the challenge of photorealism in diffusion-model images by developing a five-category artifact taxonomy and assessing human ability to distinguish AI-generated from real photographs through a large-scale crowdsourced experiment. It demonstrates that scene complexity, artifact type, display duration, and human curation significantly shape detection accuracy, with curated prompts often yielding more photorealistic but harder-to-detect images. The study provides quantitative benchmarks (e.g., 76% AI-image detection vs 74% real-image detection) and qualitative insights from participant comments, and releases replication data to foster reproducibility. These findings inform AI-literacy tools and highlight the limits of both automated detectors and diffusion models in producing consistently photorealistic images in 2024.

Abstract

Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.

Paper Structure

This paper contains 33 sections, 24 figures, 1 table.

Figures (24)

  • Figure 1: Exemplar images of photorealism across a range of generative models. Examples of AI-generated images from 2014 to 2024 goodfellow2014generativeadversarialnetworksradford2015unsupervisedfaceimageforgerykarras2018progressivegrowinggansimprovedkarras2019stylekarras2020analyzingimprovingimagequalitypodell2024sdxladobe_firefly.
  • Figure 2: Overview of the taxonomy development process. In the background research stage, we reviewed existing literature on visible features of AI-generated images from a wide range of sources. This included academic literature, practitioner perspectives in AI literacy articles, and discussions on the photorealism of AI-generated images online. From these features, we developed an initial taxonomy of artifacts. In the Generation and Curation stage, we used our taxonomy of artifacts to create a dataset of 599 images. Of these images, 149 were real photographs curated from the internet, and 450 were generated in Midjourney, Firefly, and Stable Diffusion through extensive iteration with photorealistic image generation techniques. We used the dataset of images for an online crowdsourced experiment where we evaluated participant accuracy in identifying AI-generated images. We iteratively refined the taxonomy based on results from the experiment and continued monitoring new literature on AI-generated images as generative models evolved.
  • Figure 3: Images of Barack Obama generated in Midjourney V5. Images were created by progressively adding details to the prompt shown below each image: A. "Portrait of Barack Obama." B. "Portrait of Barack Obama, hyperrealistic, megapixel." C. "Portrait of Barack Obama, sitting in his Oval Office, smiling, hyperrealistic, megapixel." D. "A portrait of Barack Obama sitting in the Oval Office, smiling, wearing a suit and tie, shot on Kodak, hyperrealistic, grainy, official portrait."
  • Figure 4: Stable Diffusion pipeline and outputs of varied styles from the same pose and prompt.A. Four pipelines for generating four variations of the prompt "photo of a 25 year old man eating a slice of pizza, outside on the grass in a park, sunny, plain clothes." B. A sample of the variations that we labeled as having the style of a "3D Render", "Photoshoot", or "iPhone photo."
  • Figure 5: A screenshot of the experiment website interface.
  • ...and 19 more figures