Table of Contents
Fetching ...

Disharmony: Forensics using Reverse Lighting Harmonization

Philip Wootaek Shin, Jack Sampson, Vijaykrishnan Narayanan, Andres Marquez, Mahantesh Halappanavar

TL;DR

The paper addresses the challenge of detecting AI-generated or edited images, focusing on edits where objects are inserted and harmonized into scenes. It introduces Disharmony, a harmonization-detection network built by aggregating three lighting adjustment paradigms and training a segmentation model (MaskFormer) on a composite dataset (DISK25k, RealHM, IH) to detect edited regions. Results show Disharmony outperforms established forensic networks (MantraNet, CAT-Net, HiFi-Net, IML-ViT) across multiple harmonization methods, with higher ROC AUC and mIoU. A sensitivity study and discussions outline limitations and future directions, including diffusion-based edits and virtual try-on, highlighting a scalable path for robust image authenticity verification in the era of generative AI.

Abstract

Content generation and manipulation approaches based on deep learning methods have seen significant advancements, leading to an increased need for techniques to detect whether an image has been generated or edited. Another area of research focuses on the insertion and harmonization of objects within images. In this study, we explore the potential of using harmonization data in conjunction with a segmentation model to enhance the detection of edited image regions. These edits can be either manually crafted or generated using deep learning methods. Our findings demonstrate that this approach can effectively identify such edits. Existing forensic models often overlook the detection of harmonized objects in relation to the background, but our proposed Disharmony Network addresses this gap. By utilizing an aggregated dataset of harmonization techniques, our model outperforms existing forensic networks in identifying harmonized objects integrated into their backgrounds, and shows potential for detecting various forms of edits, including virtual try-on tasks.

Disharmony: Forensics using Reverse Lighting Harmonization

TL;DR

The paper addresses the challenge of detecting AI-generated or edited images, focusing on edits where objects are inserted and harmonized into scenes. It introduces Disharmony, a harmonization-detection network built by aggregating three lighting adjustment paradigms and training a segmentation model (MaskFormer) on a composite dataset (DISK25k, RealHM, IH) to detect edited regions. Results show Disharmony outperforms established forensic networks (MantraNet, CAT-Net, HiFi-Net, IML-ViT) across multiple harmonization methods, with higher ROC AUC and mIoU. A sensitivity study and discussions outline limitations and future directions, including diffusion-based edits and virtual try-on, highlighting a scalable path for robust image authenticity verification in the era of generative AI.

Abstract

Content generation and manipulation approaches based on deep learning methods have seen significant advancements, leading to an increased need for techniques to detect whether an image has been generated or edited. Another area of research focuses on the insertion and harmonization of objects within images. In this study, we explore the potential of using harmonization data in conjunction with a segmentation model to enhance the detection of edited image regions. These edits can be either manually crafted or generated using deep learning methods. Our findings demonstrate that this approach can effectively identify such edits. Existing forensic models often overlook the detection of harmonized objects in relation to the background, but our proposed Disharmony Network addresses this gap. By utilizing an aggregated dataset of harmonization techniques, our model outperforms existing forensic networks in identifying harmonized objects integrated into their backgrounds, and shows potential for detecting various forms of edits, including virtual try-on tasks.
Paper Structure (18 sections, 10 figures, 3 tables)

This paper contains 18 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Task overview of our Disharmony
  • Figure 2: The overall training pipeline of Disharmony is depicted here. The training process is divided into two stages: initially, we pretrain MaskFormer using the DISK25k dataset(Step A). Subsequently, we fine-tune this model with the IH Dataset(Step B) and the RealHM Dataset(Step C), culminating in the development of Disharmony.
  • Figure 3: Samples from the test set. We randomly selected 100 images from the iHarmony4 datasetiharmony4, each consisting of a ground truth image, a composite image (randomly chosen one composite from different composites), and a corresponding mask. The composite images and masks were then processed using various harmonization methods (DoveNetDoveNet2020, HarmonizerHarmonizer, HTIHT, Hi-NetINR, and PCT-NetPCTNET) to generate the test images.
  • Figure 4: Qualitative results of Disharmony across different harmonization methods(DoveNetDoveNet2020, HarmonizerHarmonizer, HTIHT, Hi-NetINR, and PCT-NetPCTNET) and composite images. For clarity, the background is shown in green to make the segmentation masks more visually discernible, with the resulting segmentation mask retaining the original colors of the image.
  • Figure 5: Qualitative comparison of various forensic methods(MantraNetMantra-net, Cat-NetCAT_net, HiFi-NetHiFi-net, IML-ViTIML-ViT) and Disharmony across different harmonization methods(DoveNetDoveNet2020, HarmonizerHarmonizer, HTIHT, Hi-NetINR, and PCT-NetPCTNET), using the given image and mask.
  • ...and 5 more figures