Table of Contents
Fetching ...

DLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

Sudev Kumar Padhi, Sk. Subidh Ali

TL;DR

The paper introduces DLOVE, a targeted adversarial overwriting attack against DNN-based watermarking to replace the embedded watermark with a chosen one. It formalizes a threat model with white-box and black-box settings, and develops a surrogate-model-based attack that optimizes perturbations within a budget $\epsilon$ to steer the decoder toward the target watermark $\beta$ while suppressing the original $\alpha$, using the objective $l(D(W+\delta),\beta) - l(D(W+\delta),\alpha)$. Empirically, DLOVE is demonstrated on HiDDeN, ReDMark, PIMoG, and Hiding Images in an Image, achieving high attack success rates and exposing security vulnerabilities in modern watermarking techniques under both direct and transferable attacks. The work provides a practical benchmark for evaluating and strengthening future deep learning–based watermarking schemes against overwriting threats.

Abstract

Recent developments in Deep Neural Network (DNN) based watermarking techniques have shown remarkable performance. The state-of-the-art DNN-based techniques not only surpass the robustness of classical watermarking techniques but also show their robustness against many image manipulation techniques. In this paper, we performed a detailed security analysis of different DNN-based watermarking techniques. We propose a new class of attack called the Deep Learning-based OVErwriting (DLOVE) attack, which leverages adversarial machine learning and overwrites the original embedded watermark with a targeted watermark in a watermarked image. To the best of our knowledge, this attack is the first of its kind. We have considered scenarios where watermarks are used to devise and formulate an adversarial attack in white box and black box settings. To show adaptability and efficiency, we launch our DLOVE attack analysis on seven different watermarking techniques, HiDDeN, ReDMark, PIMoG, Stegastamp, Aparecium, Distortion Agostic Deep Watermarking and Hiding Images in an Image. All these techniques use different approaches to create imperceptible watermarked images. Our attack analysis on these watermarking techniques with various constraints highlights the vulnerabilities of DNN-based watermarking. Extensive experimental results validate the capabilities of DLOVE. We propose DLOVE as a benchmark security analysis tool to test the robustness of future deep learning-based watermarking techniques.

DLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

TL;DR

The paper introduces DLOVE, a targeted adversarial overwriting attack against DNN-based watermarking to replace the embedded watermark with a chosen one. It formalizes a threat model with white-box and black-box settings, and develops a surrogate-model-based attack that optimizes perturbations within a budget to steer the decoder toward the target watermark while suppressing the original , using the objective . Empirically, DLOVE is demonstrated on HiDDeN, ReDMark, PIMoG, and Hiding Images in an Image, achieving high attack success rates and exposing security vulnerabilities in modern watermarking techniques under both direct and transferable attacks. The work provides a practical benchmark for evaluating and strengthening future deep learning–based watermarking schemes against overwriting threats.

Abstract

Recent developments in Deep Neural Network (DNN) based watermarking techniques have shown remarkable performance. The state-of-the-art DNN-based techniques not only surpass the robustness of classical watermarking techniques but also show their robustness against many image manipulation techniques. In this paper, we performed a detailed security analysis of different DNN-based watermarking techniques. We propose a new class of attack called the Deep Learning-based OVErwriting (DLOVE) attack, which leverages adversarial machine learning and overwrites the original embedded watermark with a targeted watermark in a watermarked image. To the best of our knowledge, this attack is the first of its kind. We have considered scenarios where watermarks are used to devise and formulate an adversarial attack in white box and black box settings. To show adaptability and efficiency, we launch our DLOVE attack analysis on seven different watermarking techniques, HiDDeN, ReDMark, PIMoG, Stegastamp, Aparecium, Distortion Agostic Deep Watermarking and Hiding Images in an Image. All these techniques use different approaches to create imperceptible watermarked images. Our attack analysis on these watermarking techniques with various constraints highlights the vulnerabilities of DNN-based watermarking. Extensive experimental results validate the capabilities of DLOVE. We propose DLOVE as a benchmark security analysis tool to test the robustness of future deep learning-based watermarking techniques.
Paper Structure (22 sections, 5 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 5 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the proposed $DLOVE$ attack leveraging Adversarial Machine Learning to a create well-crafted perturbation to overwrite the original watermark with the target watermark.
  • Figure 2: Overview of surrogate model attack a) Training the surrogate model using surrogate dataset b) Fine-tuning the surrogate decoder with the watermarked image of the target $DNN$-based watermarking technique c) Attacking the decoder of the target $DNN$-based watermarking technique after generating the well-crafted perturbation from the surrogate decoder.
  • Figure 3: The well-crafted imperceptible perturbation is successfully added to the original watermarked image without deteriorating the image quality of the watermarked image.
  • Figure 4: Result of attacking the watermarked image created by the technique of Hiding Images in an Image using the $DLOVE$ attack.
  • Figure 5: Artifacts appear on attacking some watermarked images using $DLOVE$ when cosine similarity is less than $0.1$.