DF2023: The Digital Forensics 2023 Dataset for Image Forgery Detection
David Fischinger, Martin Boyer
TL;DR
This paper tackles the problem of detecting local image forgeries by introducing a large-scale, publicly available dataset, DF2023, containing about 1,000,000 forged images across splicing, copy-move, removal, and enhancement. It details a principled dataset-generation pipeline that sources pristine and donor content from MS-COCO, creates 256×256 patches, applies diverse preprocessing and masking strategies, and produces precise ground-truth masks. The authors emphasize that the dataset enables unbiased, reproducible benchmarking of forgery detectors and can significantly reduce data-collection overhead. Experimental references indicate that training a simple network on DF2023 yields state-of-the-art results, underscoring the dataset's practical value for advancing the field.
Abstract
The deliberate manipulation of public opinion, especially through altered images, which are frequently disseminated through online social networks, poses a significant danger to society. To fight this issue on a technical level we support the research community by releasing the Digital Forensics 2023 (DF2023) training and validation dataset, comprising one million images from four major forgery categories: splicing, copy-move, enhancement and removal. This dataset enables an objective comparison of network architectures and can significantly reduce the time and effort of researchers preparing datasets.
