Detecting Localized Deepfakes: How Well Do Synthetic Image Detectors Handle Inpainting?
Serafino Pandolfini, Lorenzo Pellegrini, Matteo Ferrara, Davide Maltoni
TL;DR
This work benchmarks whether state-of-the-art detectors trained on fully synthetic images can generalize to localized inpainting and region-level edits, a growing threat in cybersecurity-relevant media. Using two backbone families (CNN-based ResNet-50 CLIP and self-supervised vision transformers) and AI-GenBench training, the authors evaluate transfer performance across BR-Gen, TGIF, and TGIF2, varying mask sizes, inpainting types, and compression. The results show partial transferability: detectors perform best on medium-to-large edits and full-regeneration scenarios, with significant degradation on small or subtle inpaintings, highlighting the limitations of binary real-vs-fake classification for localization. The findings emphasize the need for hybrid detection systems that couple global classifiers with localized, segmentation-aware cues to improve robustness against localized deepfakes.
Abstract
The rapid progress of generative AI has enabled highly realistic image manipulations, including inpainting and region-level editing. These approaches preserve most of the original visual context and are increasingly exploited in cybersecurity-relevant threat scenarios. While numerous detectors have been proposed for identifying fully synthetic images, their ability to generalize to localized manipulations remains insufficiently characterized. This work presents a systematic evaluation of state-of-the-art detectors, originally trained for the deepfake detection on fully synthetic images, when applied to a distinct challenge: localized inpainting detection. The study leverages multiple datasets spanning diverse generators, mask sizes, and inpainting techniques. Our experiments show that models trained on a large set of generators exhibit partial transferability to inpainting-based edits and can reliably detect medium- and large-area manipulations or regeneration-style inpainting, outperforming many existing ad hoc detection approaches.
