Table of Contents
Fetching ...

Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection

Fanrui Zhang, Qiang Zhang, Sizhuo Zhou, Jianwen Sun, Chuanhao Li, Jiaxin Ai, Yukang Feng, Yujie Zhang, Wenjie Li, Zizhen Li, Yifan Chang, Jiawei Liu, Kaipeng Zhang

TL;DR

ForenAgent presents an interactive, tool-augmented approach to image forgery detection by enabling a multimodal LLM to autonomously generate and execute a Python-based low-level toolchain. The framework combines a Cold Start phase with Reinforcement Fine-Tuning (GRPO-based rewards) to cultivate dynamic reasoning over a global-to-local investigative loop comprising global perception, local focusing, iterative probing, and holistic adjudication. The authors introduce FABench, a large, heterogeneous dataset with 100k images and 200k agent-interaction QA pairs, to train and evaluate the system, and demonstrate superior performance and interpretable reasoning on FABench and SIDA-Test compared to strong baselines. The work highlights emergent tool-use capabilities and reflective reasoning, marking a significant step toward intelligent, explainable, tool-augmented image forensics.

Abstract

Existing image forgery detection (IFD) methods either exploit low-level, semantics-agnostic artifacts or rely on multimodal large language models (MLLMs) with high-level semantic knowledge. Although naturally complementary, these two information streams are highly heterogeneous in both paradigm and reasoning, making it difficult for existing methods to unify them or effectively model their cross-level interactions. To address this gap, we propose ForenAgent, a multi-round interactive IFD framework that enables MLLMs to autonomously generate, execute, and iteratively refine Python-based low-level tools around the detection objective, thereby achieving more flexible and interpretable forgery analysis. ForenAgent follows a two-stage training pipeline combining Cold Start and Reinforcement Fine-Tuning to enhance its tool interaction capability and reasoning adaptability progressively. Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication, and instantiate it as both a data-sampling strategy and a task-aligned process reward. For systematic training and evaluation, we construct FABench, a heterogeneous, high-quality agent-forensics dataset comprising 100k images and approximately 200k agent-interaction question-answer pairs. Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks when assisted by low-level tools, charting a promising route toward general-purpose IFD. The code will be released after the review process is completed.

Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection

TL;DR

ForenAgent presents an interactive, tool-augmented approach to image forgery detection by enabling a multimodal LLM to autonomously generate and execute a Python-based low-level toolchain. The framework combines a Cold Start phase with Reinforcement Fine-Tuning (GRPO-based rewards) to cultivate dynamic reasoning over a global-to-local investigative loop comprising global perception, local focusing, iterative probing, and holistic adjudication. The authors introduce FABench, a large, heterogeneous dataset with 100k images and 200k agent-interaction QA pairs, to train and evaluate the system, and demonstrate superior performance and interpretable reasoning on FABench and SIDA-Test compared to strong baselines. The work highlights emergent tool-use capabilities and reflective reasoning, marking a significant step toward intelligent, explainable, tool-augmented image forensics.

Abstract

Existing image forgery detection (IFD) methods either exploit low-level, semantics-agnostic artifacts or rely on multimodal large language models (MLLMs) with high-level semantic knowledge. Although naturally complementary, these two information streams are highly heterogeneous in both paradigm and reasoning, making it difficult for existing methods to unify them or effectively model their cross-level interactions. To address this gap, we propose ForenAgent, a multi-round interactive IFD framework that enables MLLMs to autonomously generate, execute, and iteratively refine Python-based low-level tools around the detection objective, thereby achieving more flexible and interpretable forgery analysis. ForenAgent follows a two-stage training pipeline combining Cold Start and Reinforcement Fine-Tuning to enhance its tool interaction capability and reasoning adaptability progressively. Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication, and instantiate it as both a data-sampling strategy and a task-aligned process reward. For systematic training and evaluation, we construct FABench, a heterogeneous, high-quality agent-forensics dataset comprising 100k images and approximately 200k agent-interaction question-answer pairs. Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks when assisted by low-level tools, charting a promising route toward general-purpose IFD. The code will be released after the review process is completed.

Paper Structure

This paper contains 18 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: ForenAgent autonomously composes a global-to-local Python toolchain, accurately delivers a tampered verdict with precise localization of the forged region, and further demonstrates reflective self-correction by carefully revising an initially mislocalized crop to the appropriate region of interest.
  • Figure 2: The overall architecture of the ForenAgent is illustrated, with the upper part showing the FABench construction process and the lower part presenting the training pipeline of ForenAgent.
  • Figure 3: Examples of tampered and synthetic images from diverse FABench generators.
  • Figure 4: The complete evidence chain by which ForenAgent correctly identifies a synthetic image.
  • Figure 5: Distribution of tool usage frequencies.
  • ...and 1 more figures