Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai, Haowei Wang, Jiayi Ji, Yanshu Zhoumen, Shen Chen, Taiping Yao, Xiaoshuai Sun
TL;DR
The paper tackles the difficulty of detecting localized AI-generated forgeries by introducing BR-Gen, a 150k-sample dataset that covers scene-level edits in stuff and background regions, built via an automated Perception-Creation-Evaluation pipeline. It further presents NFA-ViT, a noise-guided forgery amplification transformer that diffuses subtle forgery cues across the image through a dual-branch attention mechanism and a learnable decoder. Extensive experiments on BR-Gen show that current methods struggle with these broader edits, while NFA-ViT achieves strong detection and localization performance and generalizes across benchmarks. Together, BR-Gen and NFA-ViT offer a new, challenging platform and a robust method for advancing localized AIGC forgery detection in diverse, real-world scenes.
Abstract
The rise of AI-generated image tools has made localized forgeries increasingly realistic, posing challenges for visual content integrity. Although recent efforts have explored localized AIGC detection, existing datasets predominantly focus on object-level forgeries while overlooking broader scene edits in regions such as sky or ground. To address these limitations, we introduce \textbf{BR-Gen}, a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations, which are based on semantic calibration to ensure high-quality samples. BR-Gen is constructed through a fully automated ``Perception-Creation-Evaluation'' pipeline to ensure semantic coherence and visual realism. In addition, we further propose \textbf{NFA-ViT}, a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries by amplifying subtle forgery-related features across the entire image. NFA-ViT mines heterogeneous regions in images, \emph{i.e.}, potential edited areas, by noise fingerprints. Subsequently, attention mechanism is introduced to compel the interaction between normal and abnormal features, thereby propagating the traces throughout the entire image, allowing subtle forgeries to influence a broader context and improving overall detection robustness. Extensive experiments demonstrate that BR-Gen constructs entirely new scenarios that are not covered by existing methods. Take a step further, NFA-ViT outperforms existing methods on BR-Gen and generalizes well across current benchmarks.
