Table of Contents
Fetching ...

Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes

Ali Borji

TL;DR

The paper tackles the problem of distinguishing generated imagery from real photographs by compiling a taxonomy of qualitative indicators of deepfakes. It surveys quantitative evaluation metrics and qualitative evaluation benchmarks, arguing that single-image authenticity often eludes purely metric-based assessment and benefits from human-observable cues. The authors categorize qualitative failures into five areas—body parts, geometry, physics, semantics/logic, and text/details—and extend the discussion to supplemental cues, challenging object types, memorization, and cross-study findings. The work emphasizes a practical, multi-faceted approach to detection, highlighting both limitations and opportunities for training, public awareness, and future research in video and audio deepfake detection.

Abstract

The ability of image and video generation models to create photorealistic images has reached unprecedented heights, making it difficult to distinguish between real and fake images in many cases. However, despite this progress, a gap remains between the quality of generated images and those found in the real world. To address this, we have reviewed a vast body of literature from both academic publications and social media to identify qualitative shortcomings in image generation models, which we have classified into five categories. By understanding these failures, we can identify areas where these models need improvement, as well as develop strategies for detecting deep fakes. The prevalence of deep fakes in today's society is a serious concern, and our findings can help mitigate their negative impact.

Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes

TL;DR

The paper tackles the problem of distinguishing generated imagery from real photographs by compiling a taxonomy of qualitative indicators of deepfakes. It surveys quantitative evaluation metrics and qualitative evaluation benchmarks, arguing that single-image authenticity often eludes purely metric-based assessment and benefits from human-observable cues. The authors categorize qualitative failures into five areas—body parts, geometry, physics, semantics/logic, and text/details—and extend the discussion to supplemental cues, challenging object types, memorization, and cross-study findings. The work emphasizes a practical, multi-faceted approach to detection, highlighting both limitations and opportunities for training, public awareness, and future research in video and audio deepfake detection.

Abstract

The ability of image and video generation models to create photorealistic images has reached unprecedented heights, making it difficult to distinguish between real and fake images in many cases. However, despite this progress, a gap remains between the quality of generated images and those found in the real world. To address this, we have reviewed a vast body of literature from both academic publications and social media to identify qualitative shortcomings in image generation models, which we have classified into five categories. By understanding these failures, we can identify areas where these models need improvement, as well as develop strategies for detecting deep fakes. The prevalence of deep fakes in today's society is a serious concern, and our findings can help mitigate their negative impact.
Paper Structure (16 sections, 39 figures, 1 table)

This paper contains 16 sections, 39 figures, 1 table.

Figures (39)

  • Figure 1: The Fishmarket, Dieppe, 1902 - Camille Pissarro. When observed more closely, it becomes apparent that the faces in the image lack clarity and numerous details are either incorrect or absent, similar to fake images. Although such images may appear authentic at first glance, scrutinizing them thoroughly is crucial to avoid overlooking errors. It is advisable to conduct a detailed examination of each object within the image by zooming in and analyzing its shape, features, location, and interaction with other objects. This approach allows for a more accurate assessment of the image's authenticity and being free from errors.
  • Figure 2: Examples of poorly generated faces.
  • Figure 3: Fake images can be exposed through background cues.
  • Figure 4: Here are some instances of eyes that were generated poorly. The eye in the bottom right corner is an actual photograph of a patient who has an irregularly shaped pupil. You can refer to https://n.neurology.org/content/91/15/715 for more details. This case represents a unique manifestation of a condition known as "cat's eye Adie-like pupil," which is considered a warning sign for ICE syndrome.
  • Figure 5: Here are some examples of images where the gaze direction is problematic. In these images, one eye appears to be looking in a different direction compared to the other, similar to a medical condition called Strabismus in the real world. You can check out https://en.wikipedia.org/wiki/Strabismus for additional information on this topic.
  • ...and 34 more figures