Table of Contents
Fetching ...

Exploring Saliency Bias in Manipulation Detection

Joshua Krinsky, Alan Bettis, Qiuyu Tang, Daniel Moreira, Aparna Bharati

TL;DR

This work investigates how perceptual saliency biases manipulation detection in images, addressing misinformation risks. It combines a human saliency study, automated saliency estimation, and CLIP-based semantic analysis to assess how attention to manipulated regions affects detection and semantic interpretation across multiple datasets. The findings show a strong saliency-driven bias in both humans and detectors, and demonstrate that saliency-guided manipulations (SaGIM) can increase detectability and reduce performance variance, underscoring the need for semantic-aware forensics. The work thus provides a framework for prioritizing forensic resources and highlights the link between visual saliency, semantic change, and detection efficacy across real-world datasets.

Abstract

The social media-fuelled explosion of fake news and misinformation supported by tampered images has led to growth in the development of models and datasets for image manipulation detection. However, existing detection methods mostly treat media objects in isolation, without considering the impact of specific manipulations on viewer perception. Forensic datasets are usually analyzed based on the manipulation operations and corresponding pixel-based masks, but not on the semantics of the manipulation, i.e., type of scene, objects, and viewers' attention to scene content. The semantics of the manipulation play an important role in spreading misinformation through manipulated images. In an attempt to encourage further development of semantic-aware forensic approaches to understand visual misinformation, we propose a framework to analyze the trends of visual and semantic saliency in popular image manipulation datasets and their impact on detection.

Exploring Saliency Bias in Manipulation Detection

TL;DR

This work investigates how perceptual saliency biases manipulation detection in images, addressing misinformation risks. It combines a human saliency study, automated saliency estimation, and CLIP-based semantic analysis to assess how attention to manipulated regions affects detection and semantic interpretation across multiple datasets. The findings show a strong saliency-driven bias in both humans and detectors, and demonstrate that saliency-guided manipulations (SaGIM) can increase detectability and reduce performance variance, underscoring the need for semantic-aware forensics. The work thus provides a framework for prioritizing forensic resources and highlights the link between visual saliency, semantic change, and detection efficacy across real-world datasets.

Abstract

The social media-fuelled explosion of fake news and misinformation supported by tampered images has led to growth in the development of models and datasets for image manipulation detection. However, existing detection methods mostly treat media objects in isolation, without considering the impact of specific manipulations on viewer perception. Forensic datasets are usually analyzed based on the manipulation operations and corresponding pixel-based masks, but not on the semantics of the manipulation, i.e., type of scene, objects, and viewers' attention to scene content. The semantics of the manipulation play an important role in spreading misinformation through manipulated images. In an attempt to encourage further development of semantic-aware forensic approaches to understand visual misinformation, we propose a framework to analyze the trends of visual and semantic saliency in popular image manipulation datasets and their impact on detection.
Paper Structure (7 sections, 9 figures, 1 table)

This paper contains 7 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Saliency of the manipulation is an important factor in determining if a human or machine will consider an image to be manipulated.
  • Figure 2: Example saliency and manipulation prediction maps obtained from the human study. Maps (b) and (c) are compared using Mean Recall to estimate the saliency of the manipulation. Maps (b) and (d) are compared using ROC to understand the accuracy of human manipulation prediction.
  • Figure 3: Detection performance (AuROC) from human participants for five saliency levels over the RT dataset (number of images in each group in parentheses).
  • Figure 4: Proportion of the dataset in each saliency group.
  • Figure 5: Manipulation detection and localization (average AuROC) results on the 3 datasets. The first plot additionally shows efficacy of manipulated region predictions collected from the human study.
  • ...and 4 more figures