Table of Contents
Fetching ...

Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

Ze Zhang, Enyuan Zhao, Di Niu, Jie Nie, Xinyue Liang, Lei Huang

TL;DR

This work defines the Remote Sensing Copy-Move Question Answering (RSCMQA) task to jointly detect copy-move forgeries and reason about tampered RS images through QA. It introduces five large, region-rich datasets (RS-CMQA, RS-CMQA-B, Real-RSCM, RS-TQA, RS-TQA-B) and presents the Copy-Move Forgery Perception Framework (CMFPF), which uses region-discrimination guided prompts to inject tampering cues into both visual and textual modalities. The approach yields state-of-the-art results across multiple RS-CMQA datasets, demonstrates robustness to blurred tampering, and maintains transferability across related tasks, outperforming general VQA and RSVQA baselines. By providing rich datasets and a targeted multimodal framework, the work advances practical tampering perception for land-resource monitoring and national defense applications.

Abstract

Driven by practical demands in land resource monitoring and national defense security, this paper introduces the Remote Sensing Copy-Move Question Answering (RSCMQA) task. Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios and inferring relationships between objects. We present a suite of global RSCMQA datasets, comprising images from 29 different regions across 14 countries. Specifically, we propose five distinct datasets, including the basic dataset RS-CMQA, the category-balanced dataset RS-CMQA-B, the high-authenticity dataset Real-RSCM, the extended dataset RS-TQA, and the extended category-balanced dataset RS-TQA-B. These datasets fill a critical gap in the field while ensuring comprehensiveness, balance, and challenge. Furthermore, we introduce a region-discrimination-guided multimodal copy-move forgery perception framework (CMFPF), which enhances the accuracy of answering questions about tampered images by leveraging prompt about the differences and connections between the source and tampered domains. Extensive experiments demonstrate that our method provides a stronger benchmark for RSCMQA compared to general VQA and RSVQA models. Our datasets and code are publicly available at https://github.com/shenyedepisa/RSCMQA.

Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

TL;DR

This work defines the Remote Sensing Copy-Move Question Answering (RSCMQA) task to jointly detect copy-move forgeries and reason about tampered RS images through QA. It introduces five large, region-rich datasets (RS-CMQA, RS-CMQA-B, Real-RSCM, RS-TQA, RS-TQA-B) and presents the Copy-Move Forgery Perception Framework (CMFPF), which uses region-discrimination guided prompts to inject tampering cues into both visual and textual modalities. The approach yields state-of-the-art results across multiple RS-CMQA datasets, demonstrates robustness to blurred tampering, and maintains transferability across related tasks, outperforming general VQA and RSVQA baselines. By providing rich datasets and a targeted multimodal framework, the work advances practical tampering perception for land-resource monitoring and national defense applications.

Abstract

Driven by practical demands in land resource monitoring and national defense security, this paper introduces the Remote Sensing Copy-Move Question Answering (RSCMQA) task. Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios and inferring relationships between objects. We present a suite of global RSCMQA datasets, comprising images from 29 different regions across 14 countries. Specifically, we propose five distinct datasets, including the basic dataset RS-CMQA, the category-balanced dataset RS-CMQA-B, the high-authenticity dataset Real-RSCM, the extended dataset RS-TQA, and the extended category-balanced dataset RS-TQA-B. These datasets fill a critical gap in the field while ensuring comprehensiveness, balance, and challenge. Furthermore, we introduce a region-discrimination-guided multimodal copy-move forgery perception framework (CMFPF), which enhances the accuracy of answering questions about tampered images by leveraging prompt about the differences and connections between the source and tampered domains. Extensive experiments demonstrate that our method provides a stronger benchmark for RSCMQA compared to general VQA and RSVQA models. Our datasets and code are publicly available at https://github.com/shenyedepisa/RSCMQA.

Paper Structure

This paper contains 21 sections, 13 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Example of using question-answering method to obtain key information about remote sensing image tampering.
  • Figure 2: Raw images distribution in RS-CMQA dataset.
  • Figure 3: Examples of tampered images, original images, segmentation masks, source region masks, and tampering region masks in the dataset.
  • Figure 4: (a) Distribution of basic, independent, and relational questions across the five datasets. (b) Detailed distribution of questions and answers in the five datasets. (c) Examples of question and answer types in the datasets.
  • Figure 5: An illustration of the proposed framework CMFPF and STMA module providing tampering prompt for the textual modality.
  • ...and 6 more figures