Benchmarking Scientific Image Forgery Detectors
João P. Cardenuto, Anderson Rocha
TL;DR
The paper addresses the lack of legal-safe benchmarks for detecting tampering in scientific images. It introduces RSIIL, an extendable library that can simulate common manipulations (duplication, retouching, cleaning) and generate a large synthetic RSIID dataset with pixel-level ground-truth; It also proposes a Consistent True Positive (CTP) metric to ensure fair evaluation in copy-move forgery detection. It evaluates existing CMFD methods on RSIID, revealing performance gaps when applied to scientific images and setting baselines, with public release of the dataset and tools. This framework aims to foster development of domain-specific detectors and reliable forensic evaluation tools for scientific integrity analysis.
Abstract
The scientific image integrity area presents a challenging research bottleneck, the lack of available datasets to design and evaluate forensic techniques. Its data sensitivity creates a legal hurdle that prevents one to rely on real tampered cases to build any sort of accessible forensic benchmark. To mitigate this bottleneck, we present an extendable open-source library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. Using this library and realistic scientific images, we create a large scientific forgery image benchmark (39,423 images) with an enriched ground-truth. In addition, concerned about the high number of retracted papers due to image duplication, this work evaluates the state-of-the-art copy-move detection methods in the proposed dataset, using a new metric that asserts consistent match detection between the source and the copied region. The dataset and source-code will be freely available upon acceptance of the paper.
