Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra, Svetlana Lazebnik, D. A. Forsyth, Anand Bhattad
TL;DR
This work shows that state-of-the-art generative models fall short of faithfully reproducing projective geometry, producing images with consistent geometric inconsistencies such as vanishing-point misalignments and shadow-illumination mismatches. It introduces a robust, data-driven evaluation framework that relies on three derived geometric cues—Object-Shadow Cues, Perspective Field Cues, and Line Segment Cues—processed by classifiers that do not use pixel data. By training on Kandinsky-generated images and evaluating across multiple unseen generators, the authors demonstrate strong, generalizable discrimination of real versus generated images based solely on geometric coherence, with AUCs up to 0.97. The findings imply that improving image realism will require architectural innovations or geometry-aware losses, rather than simply increasing data or relying on pixel-based detectors, and they advocate for geometry-centric evaluation as a core benchmark for future generators.
Abstract
Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.
