DeepEclipse: How to Break White-Box DNN-Watermarking Schemes

Alessandro Pegoraro; Carlotta Segna; Kavita Kumari; Ahmad-Reza Sadeghi

DeepEclipse: How to Break White-Box DNN-Watermarking Schemes

Alessandro Pegoraro, Carlotta Segna, Kavita Kumari, Ahmad-Reza Sadeghi

TL;DR

This work proposes DeepEclipse, a novel and unified framework designed to remove white-box watermarks, and showcases a promising solution to address the ongoing DNN watermark protection and removal challenges.

Abstract

Deep Learning (DL) models have become crucial in digital transformation, thus raising concerns about their intellectual property rights. Different watermarking techniques have been developed to protect Deep Neural Networks (DNNs) from IP infringement, creating a competitive field for DNN watermarking and removal methods. The predominant watermarking schemes use white-box techniques, which involve modifying weights by adding a unique signature to specific DNN layers. On the other hand, existing attacks on white-box watermarking usually require knowledge of the specific deployed watermarking scheme or access to the underlying data for further training and fine-tuning. We propose DeepEclipse, a novel and unified framework designed to remove white-box watermarks. We present obfuscation techniques that significantly differ from the existing white-box watermarking removal schemes. DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme, additional data, or training and fine-tuning. Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes, reducing watermark detection to random guessing while maintaining a similar model accuracy as the original one. Our framework showcases a promising solution to address the ongoing DNN watermark protection and removal challenges.

DeepEclipse: How to Break White-Box DNN-Watermarking Schemes

TL;DR

Abstract

Paper Structure (17 sections, 2 theorems, 15 equations, 7 figures, 4 tables, 5 algorithms)

This paper contains 17 sections, 2 theorems, 15 equations, 7 figures, 4 tables, 5 algorithms.

Introduction
Background
Threat Model
Design
High-Level Idea
Detailed Design
Evaluation
Experimental Setup
Evaluation Results
Frequency Detection
DeepEclipse Attack on Linear Layers
DeepEclipse Attack on Convolutional Layers
Runtime Evaluation
Related Works
Security Considerations
...and 2 more sections

Key Result

Theorem 1

$H \times H^{-1} = I_{n \times n}$ and $A_{i}$ is dimensionally compatible with $H \in \mathbb{R}^{n \times h}$, when $h > n$ and $rank(H) = n$.

Figures (7)

Figure 1: DeepEclipse overview. Top: Represents the expected behavior of the model owner, i.e., watermark insertion. Bottom: Analyzed detection and proposed obfuscation pipeline, followed by the verification process. We have assumed an adversary who has acquired an unauthorized copy of the watermarked model (Stolen Model) and is trying to hinder the model's verification (done by a third-party verifier: passive or active) by executing the obfuscation techniques (base or advanced).
Figure 2: Basic Linear Layers obfuscation. The original layer is split into two, with the first layer being the matrix multiplication between the original layer and a random matrix and the second layer being the inverse of the random matrix.
Figure 3: Advanced Linear Layers obfuscation. The watermarked layer is multiplied by a random matrix, and the subsequent layer is multiplied by the inverse of the random matrix. The subsequent Bias is updated with the original watermarked Bias.
Figure 4: Basic Convolutional Layers obfuscation. Each feature maps of the Kernel is expanded with zeros padding.
Figure 5: Advanced Convolutional Layers obfuscation. Each feature maps of the Kernel is expanded with padding using an $\epsilon$ value, then the whole layer is multiplied by a random constant $\lambda$, and the subsequent layer is also multiplied by $\frac{1}{\lambda}$.
...and 2 more figures

Theorems & Definitions (4)

Theorem 1
proof
Theorem 2
proof

DeepEclipse: How to Break White-Box DNN-Watermarking Schemes

TL;DR

Abstract

DeepEclipse: How to Break White-Box DNN-Watermarking Schemes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)