Table of Contents
Fetching ...

M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou

TL;DR

The paper tackles the limited realism and scale of image manipulation localization data by introducing the Manipulation Mask Manufacturer ($MMM$) framework, which treats original-vs-tampered image comparison as a change-detection problem enhanced by arbitrary-scale super-resolution. It combines feature embedding concatenation with distribution alignment via Maximum Mean Discrepancies ($MMD$) and employs Cross-scale Local Attention ($CSLAB$) and Local Frequency Encoding ($LFEB$) blocks to produce high-quality masks. To address data scarcity, the authors assemble the Manipulation Mask Manufacturer Dataset (MMMD) with 11,069 triplets (original, tampered, mask) spanning diverse manipulation techniques, enabling better generalization for IML models. Experiments show that pre-training on MMMD improves performance across multiple datasets and models, outperforming CASIA-based pretraining and producing more realistic tampering representations for robust forensic applications. Overall, MMM and MMMD offer a practical path toward more realistic, scalable, and transferable manipulation-detection models for real-world scenarios, with code and data to be released.

Abstract

In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will significantly enrich the types of manipulations in our data. However, images on the internet suffer from resolution and clarity issues, and the masks obtained by simply subtracting the manipulated image from the original contain various noises. These noises are difficult to remove, rendering the masks unusable for IML models. Inspired by the field of change detection, we treat the original and manipulated images as changes over time for the same image and view the data generation task as a change detection task. However, due to clarity issues between images, conventional change detection models perform poorly. Therefore, we introduced a super-resolution module and proposed the Manipulation Mask Manufacturer (MMM) framework. It enhances the resolution of both the original and tampered images, thereby improving image details for better comparison. Simultaneously, the framework converts the original and tampered images into feature embeddings and concatenates them, effectively modeling the context. Additionally, we created the Manipulation Mask Manufacturer Dataset (MMMD), a dataset that covers a wide range of manipulation techniques. We aim to contribute to the fields of image forensics and manipulation detection by providing more realistic manipulation data through MMM and MMMD. Detailed information about MMMD and the download link can be found at: the code and datasets will be made available.

M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

TL;DR

The paper tackles the limited realism and scale of image manipulation localization data by introducing the Manipulation Mask Manufacturer () framework, which treats original-vs-tampered image comparison as a change-detection problem enhanced by arbitrary-scale super-resolution. It combines feature embedding concatenation with distribution alignment via Maximum Mean Discrepancies () and employs Cross-scale Local Attention () and Local Frequency Encoding () blocks to produce high-quality masks. To address data scarcity, the authors assemble the Manipulation Mask Manufacturer Dataset (MMMD) with 11,069 triplets (original, tampered, mask) spanning diverse manipulation techniques, enabling better generalization for IML models. Experiments show that pre-training on MMMD improves performance across multiple datasets and models, outperforming CASIA-based pretraining and producing more realistic tampering representations for robust forensic applications. Overall, MMM and MMMD offer a practical path toward more realistic, scalable, and transferable manipulation-detection models for real-world scenarios, with code and data to be released.

Abstract

In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will significantly enrich the types of manipulations in our data. However, images on the internet suffer from resolution and clarity issues, and the masks obtained by simply subtracting the manipulated image from the original contain various noises. These noises are difficult to remove, rendering the masks unusable for IML models. Inspired by the field of change detection, we treat the original and manipulated images as changes over time for the same image and view the data generation task as a change detection task. However, due to clarity issues between images, conventional change detection models perform poorly. Therefore, we introduced a super-resolution module and proposed the Manipulation Mask Manufacturer (MMM) framework. It enhances the resolution of both the original and tampered images, thereby improving image details for better comparison. Simultaneously, the framework converts the original and tampered images into feature embeddings and concatenates them, effectively modeling the context. Additionally, we created the Manipulation Mask Manufacturer Dataset (MMMD), a dataset that covers a wide range of manipulation techniques. We aim to contribute to the fields of image forensics and manipulation detection by providing more realistic manipulation data through MMM and MMMD. Detailed information about MMMD and the download link can be found at: the code and datasets will be made available.
Paper Structure (13 sections, 6 equations, 4 figures, 4 tables)

This paper contains 13 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Tampered images and their corresponding masks for the image manipulation localization task.
  • Figure 2: MMM framework generated result images. From highest to lowest, the sequence is as follows: original image, tampered image, image obtained by directly subtracting the two images and binarizing with a threshold of 30, and MMM predicted image.
  • Figure 3: The proposed MMM framework. The local sampling operation samples input embeddings based on a grid of coordinates.
  • Figure 4: The framework of CSLAB and LFEB.