Table of Contents
Fetching ...

Hard-Label Cryptanalytic Extraction of Neural Network Models

Yi Chen, Xiaoyang Dong, Jian Guo, Yantian Shen, Anyu Wang, Xiaoyun Wang

TL;DR

This paper proposes the first attack that theoretically achieves functionally equivalent extraction under the hard-label setting, which applies to ReLU neural networks and is validated through practical experiments on a wide range of ReLU neural networks.

Abstract

The machine learning problem of extracting neural network parameters has been proposed for nearly three decades. Functionally equivalent extraction is a crucial goal for research on this problem. When the adversary has access to the raw output of neural networks, various attacks, including those presented at CRYPTO 2020 and EUROCRYPT 2024, have successfully achieved this goal. However, this goal is not achieved when neural networks operate under a hard-label setting where the raw output is inaccessible. In this paper, we propose the first attack that theoretically achieves functionally equivalent extraction under the hard-label setting, which applies to ReLU neural networks. The effectiveness of our attack is validated through practical experiments on a wide range of ReLU neural networks, including neural networks trained on two real benchmarking datasets (MNIST, CIFAR10) widely used in computer vision. For a neural network consisting of $10^5$ parameters, our attack only requires several hours on a single core.

Hard-Label Cryptanalytic Extraction of Neural Network Models

TL;DR

This paper proposes the first attack that theoretically achieves functionally equivalent extraction under the hard-label setting, which applies to ReLU neural networks and is validated through practical experiments on a wide range of ReLU neural networks.

Abstract

The machine learning problem of extracting neural network parameters has been proposed for nearly three decades. Functionally equivalent extraction is a crucial goal for research on this problem. When the adversary has access to the raw output of neural networks, various attacks, including those presented at CRYPTO 2020 and EUROCRYPT 2024, have successfully achieved this goal. However, this goal is not achieved when neural networks operate under a hard-label setting where the raw output is inaccessible. In this paper, we propose the first attack that theoretically achieves functionally equivalent extraction under the hard-label setting, which applies to ReLU neural networks. The effectiveness of our attack is validated through practical experiments on a wide range of ReLU neural networks, including neural networks trained on two real benchmarking datasets (MNIST, CIFAR10) widely used in computer vision. For a neural network consisting of parameters, our attack only requires several hours on a single core.
Paper Structure (58 sections, 2 theorems, 49 equations, 4 figures, 2 tables)

This paper contains 58 sections, 2 theorems, 49 equations, 4 figures, 2 tables.

Key Result

lemma thmcounterlemma

Based on the system of linear equations presented in Eq. eq:soe-of-k-nn, for $i \in \{2, \cdots, k+1\}$ and $j \in \{1, \cdots, d_i\}$, the extracted weight vector $\widehat{A}_j^{(i)} = \left[ \widehat{w}_{j,1}^{(i)}, \cdots, \widehat{w}_{j, d_{i-1}}^{(i)}\right]$ is where $C_{v, 1}^{(q)} = A_{v}^{(q)} A^{(q-1)} \cdots A^{(2)} \left[ A_{1, 1}^{(1)}, \cdots, A_{d_1, 1}^{(1)} \right]^\top$ .

Figures (4)

  • Figure 1: Left: the critical point $x = [x_1, x_2, x_3]^{\top}$ makes the output of one neuron (e.g., the solid black circle) $0$. Right: the decision boundary point $x^{\prime} = [x_1^{\prime}, x_2^{\prime}, x_3^{\prime}]^{\top}$ makes the output of the neural network $0$.
  • Figure 2: The core idea of recovering the weight vector of the $j$-th neuron in layer $i$. Let $x = [x_1, \cdots, x_{d_0}]^\top$ be a decision boundary point with $\mathcal{P}^{(i-1)} = 2^{d_{i-1}} - 1$, $\mathcal{P}^{(i)} = 2^{j-1}$, i.e., in layer $i$, only the $j$-th neuron (the red hollow circle) is active, and in layer $i-1$, all the neurons are active. The first $i-1$ layers have been extracted, and collapse into one layer. All the layers starting from layer $i+1$ collapse into a direct connection between the $j$-th neuron in layer $i$ and the final output.
  • Figure 3: A schematic diagram of finding decision boundary points. The blue solid line stands for the decision hyperplane composed of decision boundary points. The red dashed line stands for a direction vector $\varDelta \in \mathbb{R}^{d_0}$. The starting point $x \in \mathbb{R}^{d_0}$ (i.e., the solid black circle) moves along the direction $\varDelta$, and arrives at $x + s \times \varDelta$ (i.e., the hollow black circle) where $s \in \mathbb{R}$ is the moving stride.
  • Figure 4: Left: the victim model $f_{\theta}$. Right: the extracted model $f_{\widehat{\theta}}$.

Theorems & Definitions (21)

  • Definition 1: Extended Functionally Equivalent Extraction
  • Definition 2: Extended $(\varepsilon, \delta)$-Functional Equivalence
  • Definition 3: $k$-Deep Neural Network DBLP:conf/crypto/CarliniJM20
  • Definition 4: Fully Connected Layer DBLP:conf/crypto/CarliniJM20
  • Definition 5: Neuron DBLP:conf/crypto/CarliniJM20
  • Definition 6: Neuron State DBLP:journals/iacr/CanalesMartinezCHRSS23
  • Definition 7: Neural Network Architecture DBLP:conf/crypto/CarliniJM20
  • Definition 8: Neural Network Parameters DBLP:conf/crypto/CarliniJM20
  • Definition 9: Hard-Label
  • Definition 10: Model Parameter Extraction Attack
  • ...and 11 more