Table of Contents
Fetching ...

Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now

Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra, Svetlana Lazebnik, D. A. Forsyth, Anand Bhattad

TL;DR

This work shows that state-of-the-art generative models fall short of faithfully reproducing projective geometry, producing images with consistent geometric inconsistencies such as vanishing-point misalignments and shadow-illumination mismatches. It introduces a robust, data-driven evaluation framework that relies on three derived geometric cues—Object-Shadow Cues, Perspective Field Cues, and Line Segment Cues—processed by classifiers that do not use pixel data. By training on Kandinsky-generated images and evaluating across multiple unseen generators, the authors demonstrate strong, generalizable discrimination of real versus generated images based solely on geometric coherence, with AUCs up to 0.97. The findings imply that improving image realism will require architectural innovations or geometry-aware losses, rather than simply increasing data or relying on pixel-based detectors, and they advocate for geometry-centric evaluation as a core benchmark for future generators.

Abstract

Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.

Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now

TL;DR

This work shows that state-of-the-art generative models fall short of faithfully reproducing projective geometry, producing images with consistent geometric inconsistencies such as vanishing-point misalignments and shadow-illumination mismatches. It introduces a robust, data-driven evaluation framework that relies on three derived geometric cues—Object-Shadow Cues, Perspective Field Cues, and Line Segment Cues—processed by classifiers that do not use pixel data. By training on Kandinsky-generated images and evaluating across multiple unseen generators, the authors demonstrate strong, generalizable discrimination of real versus generated images based solely on geometric coherence, with AUCs up to 0.97. The findings imply that improving image realism will require architectural innovations or geometry-aware losses, rather than simply increasing data or relying on pixel-based detectors, and they advocate for geometry-centric evaluation as a core benchmark for future generators.

Abstract

Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.
Paper Structure (18 sections, 20 figures, 2 tables)

This paper contains 18 sections, 20 figures, 2 tables.

Figures (20)

  • Figure 1: We train three classifiers to identify discrepancies in projective geometry. These classifiers are trained using derived geometry cues such as object-shadow associations (left), perspective fields (middle), and line segments (right) without looking at image intensity. We use the ResNet architecture for Object-Shadow and Perspective Fields, and PointNet for Line Segments to process unordered data sets.
  • Figure 2: ROC Curves Assessing Projective Geometry Cues in Generated Images trained on Kandinsky-v3. We trained separate models for indoor scenes (top row), outdoor scenes (middle row), and a combination of indoor and outdoor scenes (last row). All our derived geometry cues classifiers are trained without looking at image intensity information and can reliably detect projective geometry errors. The recent timestamp test set (second column) confirms that these models are robust. We find hard examples using a prequalifier trained on image pixels. Our derived geometry cues consistently show high AUC for finding projective geometry errors on hard test sets -- the last two columns -- unconfident and misclassified test sets. For the unconfident test set, where the prequalifier has an AUC of 0.51 (c), 0.51 (g), and 0.49 (h) for indoor, outdoor, and combined partition, our classifiers can still accurately identify the generated images with high AUCs -- 0.82 from line segments in the indoor set, 0.84 from perspective fields in the outdoor set, and 0.80 from perspective field cues and object shadows in the combined set. Similarly, for the misclassified test set, where the prequalifier has an AUC of 0.00, as it should, our classifiers remain reliable with AUC up to 0.82. We conclude that generated images contain geometric structures not seen in real images, and these structures very reliably identify generated images by only looking at derived geometry cues.
  • Figure 3: Grad-CAM applied to our Object-Shadow and Perspective Field classifiers reveals that the high AUCs in \ref{['fig:roc_curves']} are based on real geometric errors in indoor scenes generated by Kandinsky, Stable Diffusion XL and Dalle-3 shown in each row respectively. The second and fifth columns highlight shadow and vanishing point errors, respectively. The third column overlays detected object-shadow pairs wang2022instance. Grad-CAM applied to our Object-Shadow classifier (fourth column) identifies diagnostic areas for synthetic generation, such as inconsistent shadow directions (in all three rows), mismatched shadow lengths (second row). The sixth column shows Perspective Fields jin2023perspective, and Grad-CAM applied to our Perspective Fields classifier (last column) reveals geometric errors in all three rows, particularly at ceilings and side walls, with noticeable errors also present in window grills in the first and second rows.
  • Figure 4: Grad-CAM results for outdoor scenes generated by Kandinsky, Stable Diffusion XL, and Adobe Firefly, shown in each row respectively. The second column highlights shadow errors, while the third column overlays detected object-shadow pairs wang2022instance. Grad-CAM applied to our Object-Shadow classifier (fourth column) reveals incorrect shadow shapes in the first and second rows, with shadows on the right-side pedestrians pointing in a different direction than those on the left. The fifth column shows vanishing point errors, and the sixth column presents Perspective Fields jin2023perspective. Grad-CAM applied to our Perspective Fields classifier (last column) confirms large perspective distortions on building facades and road markings, corroborating the vanishing point errors in the fifth column.
  • Figure 5: Our projective geometry classifiers identify distinct types of problems in generated images. The top row presents an example that was classified as real by the Object-Shadow classifier but correctly identified as generated by the Perspective Fields classifier. While the shadow cast by the person appears realistic, the Perspective Fields Grad-CAM highlights the problematic geometry of the shelf on the top left. In contrast, the bottom row shows an example that was correctly identified as generated by the Object-Shadow classifier but misclassified as real by the Perspective Fields classifier. Although the perspective effects in the image appear plausible, the Grad-CAM weights correctly reveal that the two chairs are casting shadows from different light sources, indicating inconsistency in scene's illumination.
  • ...and 15 more figures