Table of Contents
Fetching ...

Benchmarking Scientific Image Forgery Detectors

João P. Cardenuto, Anderson Rocha

TL;DR

The paper addresses the lack of legal-safe benchmarks for detecting tampering in scientific images. It introduces RSIIL, an extendable library that can simulate common manipulations (duplication, retouching, cleaning) and generate a large synthetic RSIID dataset with pixel-level ground-truth; It also proposes a Consistent True Positive (CTP) metric to ensure fair evaluation in copy-move forgery detection. It evaluates existing CMFD methods on RSIID, revealing performance gaps when applied to scientific images and setting baselines, with public release of the dataset and tools. This framework aims to foster development of domain-specific detectors and reliable forensic evaluation tools for scientific integrity analysis.

Abstract

The scientific image integrity area presents a challenging research bottleneck, the lack of available datasets to design and evaluate forensic techniques. Its data sensitivity creates a legal hurdle that prevents one to rely on real tampered cases to build any sort of accessible forensic benchmark. To mitigate this bottleneck, we present an extendable open-source library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. Using this library and realistic scientific images, we create a large scientific forgery image benchmark (39,423 images) with an enriched ground-truth. In addition, concerned about the high number of retracted papers due to image duplication, this work evaluates the state-of-the-art copy-move detection methods in the proposed dataset, using a new metric that asserts consistent match detection between the source and the copied region. The dataset and source-code will be freely available upon acceptance of the paper.

Benchmarking Scientific Image Forgery Detectors

TL;DR

The paper addresses the lack of legal-safe benchmarks for detecting tampering in scientific images. It introduces RSIIL, an extendable library that can simulate common manipulations (duplication, retouching, cleaning) and generate a large synthetic RSIID dataset with pixel-level ground-truth; It also proposes a Consistent True Positive (CTP) metric to ensure fair evaluation in copy-move forgery detection. It evaluates existing CMFD methods on RSIID, revealing performance gaps when applied to scientific images and setting baselines, with public release of the dataset and tools. This framework aims to foster development of domain-specific detectors and reliable forensic evaluation tools for scientific integrity analysis.

Abstract

The scientific image integrity area presents a challenging research bottleneck, the lack of available datasets to design and evaluate forensic techniques. Its data sensitivity creates a legal hurdle that prevents one to rely on real tampered cases to build any sort of accessible forensic benchmark. To mitigate this bottleneck, we present an extendable open-source library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. Using this library and realistic scientific images, we create a large scientific forgery image benchmark (39,423 images) with an enriched ground-truth. In addition, concerned about the high number of retracted papers due to image duplication, this work evaluates the state-of-the-art copy-move detection methods in the proposed dataset, using a new metric that asserts consistent match detection between the source and the copied region. The dataset and source-code will be freely available upon acceptance of the paper.

Paper Structure

This paper contains 15 sections, 4 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Example of a retraction note extracted from Cell Death & Diseases (https://www.nature.com/articles/s41419-019-1866-9, Last access May, 2021). The highlighted words in yellow 'some lanes' and 'not the appropriate ones' illustrate inaccurate regions and ambiguous causes of the retraction.
  • Figure 4: Example of Copy-Move Forgery implemented in the library. The object of image (e) containing an arrow is duplicated with (a) translation, (b) rotation, (c) flip, and (d) scaling and pasted within the same image.
  • Figure 5: Example of Overlap forgery included in the library. (a) represent a source image that is divided in overlapping regions A and B, and then presented as unique images in (b) and (c). The region A (b) suffer a post-processing brightness adjustment to make harder to compare with region B (c)
  • Figure 6: Example of Splicing forgery function included in the library. The object highlighted with a red arrow from the donor image (a) is placed in a background region of the host image (b) resulting in (c).
  • Figure 8: Pipeline of Compound figure creation. (a) Method's Input: From left to right, set of Compound Figures templates; scientific source image dataset; and input image with the chosen forgery function. (b) Method's Operations: Selects a template based on the aspect ratio of the input image; then, retrieves all images from the source dataset that fit the chosen template; later, creates Fake graphs (if indicated by the template); then, applies the forgery function in the input; and, finally, place all figure elements in the Compound figure. (c) The output figure
  • ...and 6 more figures