Table of Contents
Fetching ...

Probabilistically Robust Watermarking of Neural Networks

Mikhail Pautov, Nikita Bogdanov, Stanislav Pyatkin, Oleg Rogov, Ivan Oseledets

TL;DR

This research introduces a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation.

Abstract

As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation. Our approach does not require additional model training and can be applied to any model architecture. The key idea of our method is to compute the trigger set, which is transferable between the source model and the set of proxy models with a high probability. In our experimental study, we show that if the probability of the set being transferable is reasonably high, it can be effectively used for ownership verification of the stolen model. We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups.

Probabilistically Robust Watermarking of Neural Networks

TL;DR

This research introduces a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation.

Abstract

As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation. Our approach does not require additional model training and can be applied to any model architecture. The key idea of our method is to compute the trigger set, which is transferable between the source model and the set of proxy models with a high probability. In our experimental study, we show that if the probability of the set being transferable is reasonably high, it can be effectively used for ownership verification of the stolen model. We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups.
Paper Structure (24 sections, 1 theorem, 11 equations, 1 figure, 7 tables, 2 algorithms)

This paper contains 24 sections, 1 theorem, 11 equations, 1 figure, 7 tables, 2 algorithms.

Key Result

Lemma 1

Given the sampling procedure for proxy models from Section eq:sample_proxy, the confidence level $\alpha$ from Eq. eq:cp_int, with probability at least $\phi = (1-\alpha)^n,$ the expectation of accuracy of the proxy model $f_i \sim \mathcal{B}_{\delta, \tau}(f)$ on the verified trigger set $\mathcal

Figures (1)

  • Figure 1: The illustration of the proposed pipeline for the trigger set generation and verification. Given the source model $f$ and the hold-out data $\mathcal{D}_h$, we initialize the parametric set of proxy models $\mathcal{B}_{\delta, \tau}(f)$ introduced in Equation \ref{['eq:proxy_set']} and sample $m$ proxy models $f_1,\dots,f_m$ from this set. Then, given the procedure of trigger set generation $T = T(f, \mathcal{D}_h)$, we compute the trigger set candidates $\mathcal{D}_t$. The samples from the candidate set $\mathcal{D}_t$ that are verified by the proxy models $f_1,\dots, f_m$ are included in the verified trigger set $\mathcal{D}^*_t.$ The procedure is executed until the verified trigger set of size $n$ is collected.

Theorems & Definitions (5)

  • Remark
  • Remark
  • Lemma 1
  • proof
  • Remark