Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization
Rui Chen, Bin Liu, Changtao Miao, Xinghao Wang, Yi Li, Tao Gong, Qi Chu, Nenghai Yu
TL;DR
This work tackles image manipulation localization without dense pixel-level annotations by introducing ICFC, a training-free framework that combines Rule Decomposition and Filtering (RDF) with Objectified Rule Sets and Multi-step Progressive Reasoning (MPR) to guide multi-modal language models. RDF converts vague forensic cues into interpretable rules, filtered by CLIP to provide relevant priors, while MPR mirrors expert workflows to produce coarse bounding boxes refined through iterative reasoning and SAM-based pixel-level segmentation, along with human-readable explanations. The approach yields image-level judgments, fine-grained localization, and interpretable forensic rationales, achieving state-of-the-art performance among training-free methods and competitive results with weakly and fully supervised systems across six benchmarks. The findings highlight the potential of knowledge-guided, training-free paradigms for scalable, interpretable image forensics in practical security contexts.
Abstract
Advances in image tampering pose serious security threats, underscoring the need for effective image manipulation localization (IML). While supervised IML achieves strong performance, it depends on costly pixel-level annotations. Existing weakly supervised or training-free alternatives often underperform and lack interpretability. We propose the In-Context Forensic Chain (ICFC), a training-free framework that leverages multi-modal large language models (MLLMs) for interpretable IML tasks. ICFC integrates an objectified rule construction with adaptive filtering to build a reliable knowledge base and a multi-step progressive reasoning pipeline that mirrors expert forensic workflows from coarse proposals to fine-grained forensics results. This design enables systematic exploitation of MLLM reasoning for image-level classification, pixel-level localization, and text-level interpretability. Across multiple benchmarks, ICFC not only surpasses state-of-the-art training-free methods but also achieves competitive or superior performance compared to weakly and fully supervised approaches.
