Table of Contents
Fetching ...

Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification

Yijia Chang, Hanrui Jiang, Chao Lin, Xinyi Huang, Jian Weng

TL;DR

Experimental results demonstrate that the proposed generic PoT construction can resist attacks that have compromised existing PoT schemes, which corroborates its superiority in security.

Abstract

The great economic values of deep neural networks (DNNs) urge AI enterprises to protect their intellectual property (IP) for these models. Recently, proof-of-training (PoT) has been proposed as a promising solution to DNN IP protection, through which AI enterprises can utilize the record of DNN training process as their ownership proof. To prevent attackers from forging ownership proof, a secure PoT scheme should be able to distinguish honest training records from those forged by attackers. Although existing PoT schemes provide various distinction criteria, these criteria are based on intuitions or observations. The effectiveness of these criteria lacks clear and comprehensive analysis, resulting in existing schemes initially deemed secure being swiftly compromised by simple ideas. In this paper, we make the first move to identify distinction criteria in the style of formal methods, so that their effectiveness can be explicitly demonstrated. Specifically, we conduct systematic modeling to cover a wide range of attacks and then theoretically analyze the distinctions between honest and forged training records. The analysis results not only induce a universal distinction criterion, but also provide detailed reasoning to demonstrate its effectiveness in defending against attacks covered by our model. Guided by the criterion, we propose a generic PoT construction that can be instantiated into concrete schemes. This construction sheds light on the realization that trajectory matching algorithms, previously employed in data distillation, possess significant advantages in PoT construction. Experimental results demonstrate that our scheme can resist attacks that have compromised existing PoT schemes, which corroborates its superiority in security.

Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification

TL;DR

Experimental results demonstrate that the proposed generic PoT construction can resist attacks that have compromised existing PoT schemes, which corroborates its superiority in security.

Abstract

The great economic values of deep neural networks (DNNs) urge AI enterprises to protect their intellectual property (IP) for these models. Recently, proof-of-training (PoT) has been proposed as a promising solution to DNN IP protection, through which AI enterprises can utilize the record of DNN training process as their ownership proof. To prevent attackers from forging ownership proof, a secure PoT scheme should be able to distinguish honest training records from those forged by attackers. Although existing PoT schemes provide various distinction criteria, these criteria are based on intuitions or observations. The effectiveness of these criteria lacks clear and comprehensive analysis, resulting in existing schemes initially deemed secure being swiftly compromised by simple ideas. In this paper, we make the first move to identify distinction criteria in the style of formal methods, so that their effectiveness can be explicitly demonstrated. Specifically, we conduct systematic modeling to cover a wide range of attacks and then theoretically analyze the distinctions between honest and forged training records. The analysis results not only induce a universal distinction criterion, but also provide detailed reasoning to demonstrate its effectiveness in defending against attacks covered by our model. Guided by the criterion, we propose a generic PoT construction that can be instantiated into concrete schemes. This construction sheds light on the realization that trajectory matching algorithms, previously employed in data distillation, possess significant advantages in PoT construction. Experimental results demonstrate that our scheme can resist attacks that have compromised existing PoT schemes, which corroborates its superiority in security.
Paper Structure (25 sections, 3 theorems, 23 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 3 theorems, 23 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Given two datasets $D$ and $D^{(M)}$ sampled from the same distribution, suppose that $\mathbb{T}_T$ is trajectory output by honest training algorithms and $\mathbb{T}_T^{(M)}$ is trajectory output by forged training algorithms, then we have

Figures (5)

  • Figure 1: PoT-based model ownership verification.
  • Figure 2: Trajectory matching algorithm in the scenario of PoT construction. The solid blue lines represent the training trajectory from the prover's training record, which should be trained over training data. After sampling a fragment from this trajectory, say from $M_i$ to $M_{i+k}$, the verifier trains an analogous training trajectory over synthetic data starting from $M_i$, which is depicted with dashed yellow lines. By minimizing the distance between the destinations of these two trajectories, the synthetic data will perform similar behaviors for DNN training, i.e., training over synthetic data produces a trajectory similar to the prover's trajectory.
  • Figure 3: An illustration of four types of forging training records, where initialization algorithm $\mathbb{I}_A$ is omitted since the initial models are assumed to be honest in all cases. (a) Honest DNN training. The training trajectory is generated by operating training algorithms over training data. (b) Forward-direction attacks. Similar to honest DNN training, the training trajectory is forged after the training algorithms and data are determined. Along this direction, we depict the algorithm manipulation attacks by the red dashed line marked with ①, where $f_T$ is a manipulated algorithm that utilizes final model $M$ to output the trajectory with less effort. (c) Reverse-direction attacks. After generating trajectory through $f_T$, instead of putting $f_T$ in the training record as the training algorithms, it searches training algorithms and training data so that the retraining from them can successfully lead to the trajectory. According to whether the training data are honestly sampled from the target data distribution or not, we depict the algorithm manipulation attacks and data fabrication attacks by the red dashed lines marked with ② and ③, respectively.
  • Figure 4: Analogy between communication and training.
  • Figure 5: Experimental results of average accuracy upon various model architectures, datasets, and synthetic data size (SDS).

Theorems & Definitions (7)

  • Theorem 1: Informal
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem \ref{theo:dis}
  • Theorem \ref{theo:dis}: Restated