Table of Contents
Fetching ...

VAAS: Vision-Attention Anomaly Scoring for Image Manipulation Detection in Digital Forensics

Opeyemi Bamigbade, Mark Scanlon, John Sheppard

TL;DR

The paper addresses the growing challenge of authenticating digital images amid AI-driven manipulations by proposing VAAS, a dual-module framework that fuses global attention-based anomaly estimation from Vision Transformers with local patch-level self-consistency from SegFormer. The Hybrid Anomaly Scoring mechanism combines global and local cues into a continuous, interpretable integrity score, complemented by attention-guided anomaly maps for explainability. Empirical validation on CASIA v2.0 and DF2023 shows competitive detection and localisation (F1 and IoU) with strong interpretability, and ablations justify design choices such as backbone selection, fusion strategy, and dataset-dependent tuning of α. The approach advances forensic reliability by unifying quantitative detection with human-understandable reasoning, and the open-source release facilitates reproducibility and deployment in real-world investigations.

Abstract

Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations. Modern generative models can produce visually consistent forgeries that evade traditional detectors based on pixel or compression artefacts. Most existing approaches also lack an explicit measure of anomaly intensity, which limits their ability to quantify the severity of manipulation. This paper introduces Vision-Attention Anomaly Scoring (VAAS), a novel dual-module framework that integrates global attention-based anomaly estimation using Vision Transformers (ViT) with patch-level self-consistency scoring derived from SegFormer embeddings. The hybrid formulation provides a continuous and interpretable anomaly score that reflects both the location and degree of manipulation. Evaluations on the DF2023 and CASIA v2.0 datasets demonstrate that VAAS achieves competitive F1 and IoU performance, while enhancing visual explainability through attention-guided anomaly maps. The framework bridges quantitative detection with human-understandable reasoning, supporting transparent and reliable image integrity assessment. The source code for all experiments and corresponding materials for reproducing the results are available open source.

VAAS: Vision-Attention Anomaly Scoring for Image Manipulation Detection in Digital Forensics

TL;DR

The paper addresses the growing challenge of authenticating digital images amid AI-driven manipulations by proposing VAAS, a dual-module framework that fuses global attention-based anomaly estimation from Vision Transformers with local patch-level self-consistency from SegFormer. The Hybrid Anomaly Scoring mechanism combines global and local cues into a continuous, interpretable integrity score, complemented by attention-guided anomaly maps for explainability. Empirical validation on CASIA v2.0 and DF2023 shows competitive detection and localisation (F1 and IoU) with strong interpretability, and ablations justify design choices such as backbone selection, fusion strategy, and dataset-dependent tuning of α. The approach advances forensic reliability by unifying quantitative detection with human-understandable reasoning, and the open-source release facilitates reproducibility and deployment in real-world investigations.

Abstract

Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations. Modern generative models can produce visually consistent forgeries that evade traditional detectors based on pixel or compression artefacts. Most existing approaches also lack an explicit measure of anomaly intensity, which limits their ability to quantify the severity of manipulation. This paper introduces Vision-Attention Anomaly Scoring (VAAS), a novel dual-module framework that integrates global attention-based anomaly estimation using Vision Transformers (ViT) with patch-level self-consistency scoring derived from SegFormer embeddings. The hybrid formulation provides a continuous and interpretable anomaly score that reflects both the location and degree of manipulation. Evaluations on the DF2023 and CASIA v2.0 datasets demonstrate that VAAS achieves competitive F1 and IoU performance, while enhancing visual explainability through attention-guided anomaly maps. The framework bridges quantitative detection with human-understandable reasoning, supporting transparent and reliable image integrity assessment. The source code for all experiments and corresponding materials for reproducing the results are available open source.

Paper Structure

This paper contains 32 sections, 9 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Example of content-changed-based image manipulation techniques (copy–move, splicing, and removal). Extracted from Shi_2024
  • Figure 2: Overview of the VAAS framework showing the data flow through its training (left) and inference (right) stages. The architecture integrates global attention analysis and local consistency estimation to detect spatial inconsistencies indicative of image manipulation.
  • Figure 3: Qualitative visualisation of VAAS on CASIA v2.0 (top two rows) and DF2023 (bottom two rows). Columns (1)–(6) show: input image, ground-truth mask, binary Px output, Px heatmap overlay, Fx attention overlay, and final hybrid anomaly score. High-anomaly samples (top rows) exhibit precise localisation and strong global cues, while mid-range samples show balanced but diffuse attention, illustrating the complementary interaction between Px and Fx.
  • Figure 4: Fx backbone component analysis showing F1-scores on DF2023 and CASIA v2.0. ViT-Base offers the best balance between accuracy and efficiency.
  • Figure 5: Anomaly scoring fusion component analysis showing F1-scores across $\alpha$. Harmonic fusion varies smoothly and remains stable across datasets, while weighted fusion peaks near $\alpha{=}0.6$.
  • ...and 2 more figures