Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement

Xinghao Wang; Tao Gong; Qi Chu; Bin Liu; Nenghai Yu

Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement

Xinghao Wang, Tao Gong, Qi Chu, Bin Liu, Nenghai Yu

TL;DR

This work tackles weakly supervised image manipulation localization by learning from image-level labels alone. It introduces Context-Aware Boundary Localization (CABL) to capture boundary context via a Sobel-based edge emphasis, and CAM-Guided SAM Refinement (CGSR) to convert coarse CAMs into precise masks using SAM with informative prompts, all within a dual-branch Transformer-CNN backbone. The model is trained with a simple joint loss $loss = loss_{CABL} + loss_{Trans}$ and achieves state-of-the-art performance on several datasets for both detection and pixel-level localization, while remaining robust to common degradations. The approach reduces annotation burden while delivering high-fidelity localization, with practical implications for defending against manipulated imagery in real-world settings.

Abstract

Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods. Recent approaches in image manipulation detection have largely been driven by fully supervised approaches, which require labor-intensive pixel-level annotations. Thus, it is essential to explore weakly supervised image manipulation localization methods that only require image-level binary labels for training. However, existing weakly supervised image manipulation methods overlook the importance of edge information for accurate localization, leading to suboptimal localization performance. To address this, we propose a Context-Aware Boundary Localization (CABL) module to aggregate boundary features and learn context-inconsistency for localizing manipulated areas. Furthermore, by leveraging Class Activation Mapping (CAM) and Segment Anything Model (SAM), we introduce the CAM-Guided SAM Refinement (CGSR) module to generate more accurate manipulation localization maps. By integrating two modules, we present a novel weakly supervised framework based on a dual-branch Transformer-CNN architecture. Our method achieves outstanding localization performance across multiple datasets.

Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement

TL;DR

Abstract

Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)