Table of Contents
Fetching ...

Navigating the Deep: End-to-End Extraction on Deep Neural Networks

Haolin Liu, Adrien Siproudhis, Samuel Experton, Peter Lorenz, Christina Boura, Thomas Peyrin

TL;DR

This work tackles the problem of translating black-box neural-network outputs into full parameter recovery by advancing end-to-end model extraction for deep ReLU networks. It identifies core bottlenecks in prior signature and sign-extraction methods—particularly rank deficiency and deep-layer interference—and introduces subspace-intersection strategies, targeted filtering, and numerical-precision refinements to enable polynomial-time extraction across deeper architectures. A novel sign-extraction strategy (SOE+Wiggle) combines system-of-equations with high-confidence inactive neurons to robustly recover signs without exhaustive search, while precision normalization further stabilizes reconstruction in wide layers. Empirically, the approach achieves substantial depth in extraction, recovering eight-layer networks on MNIST and CIFAR-10 with high layer coverage and low relative error on a large fraction of the input space, outperforming prior methods that stalled at shallow depths. These results pose important security implications for deployed models and motivate further work on defenses against end-to-end, polynomial-time extraction in realistic attack settings.

Abstract

Neural network model extraction has recently emerged as an important security concern, as adversaries attempt to recover a network's parameters via black-box queries. Carlini et al. proposed in CRYPTO'20 a model extraction approach, consisting of two steps: signature extraction and sign extraction. However, in practice this signature-extraction method is limited to very shallow networks only, and the proposed sign-extraction method is exponential in time. Recently, Canales-Martinez et al. (Eurocrypt'24) proposed a polynomial-time sign-extraction method, but it assumes the corresponding signatures have already been successfully extracted and can fail on so-called low-confidence neurons. In this work, we first revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.'s signature-extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues. Our approach permits the extraction of much deeper networks than previously possible. In addition, we propose new methods to improve numerical precision in signature extraction, and enhance the sign extraction part by combining two polynomial methods to avoid exponential exhaustive search in the case of low-confidence neurons. This leads to the very first end-to-end model extraction method that runs in polynomial time. We validate our attack through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth. For instance, our attack extracts consistently at least eight layers of neural networks trained on either the MNIST or CIFAR-10 datasets, while previous works could barely extract the first three layers of networks of similar width.

Navigating the Deep: End-to-End Extraction on Deep Neural Networks

TL;DR

This work tackles the problem of translating black-box neural-network outputs into full parameter recovery by advancing end-to-end model extraction for deep ReLU networks. It identifies core bottlenecks in prior signature and sign-extraction methods—particularly rank deficiency and deep-layer interference—and introduces subspace-intersection strategies, targeted filtering, and numerical-precision refinements to enable polynomial-time extraction across deeper architectures. A novel sign-extraction strategy (SOE+Wiggle) combines system-of-equations with high-confidence inactive neurons to robustly recover signs without exhaustive search, while precision normalization further stabilizes reconstruction in wide layers. Empirically, the approach achieves substantial depth in extraction, recovering eight-layer networks on MNIST and CIFAR-10 with high layer coverage and low relative error on a large fraction of the input space, outperforming prior methods that stalled at shallow depths. These results pose important security implications for deployed models and motivate further work on defenses against end-to-end, polynomial-time extraction in realistic attack settings.

Abstract

Neural network model extraction has recently emerged as an important security concern, as adversaries attempt to recover a network's parameters via black-box queries. Carlini et al. proposed in CRYPTO'20 a model extraction approach, consisting of two steps: signature extraction and sign extraction. However, in practice this signature-extraction method is limited to very shallow networks only, and the proposed sign-extraction method is exponential in time. Recently, Canales-Martinez et al. (Eurocrypt'24) proposed a polynomial-time sign-extraction method, but it assumes the corresponding signatures have already been successfully extracted and can fail on so-called low-confidence neurons. In this work, we first revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.'s signature-extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues. Our approach permits the extraction of much deeper networks than previously possible. In addition, we propose new methods to improve numerical precision in signature extraction, and enhance the sign extraction part by combining two polynomial methods to avoid exponential exhaustive search in the case of low-confidence neurons. This leads to the very first end-to-end model extraction method that runs in polynomial time. We validate our attack through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth. For instance, our attack extracts consistently at least eight layers of neural networks trained on either the MNIST or CIFAR-10 datasets, while previous works could barely extract the first three layers of networks of similar width.

Paper Structure

This paper contains 28 sections, 2 theorems, 72 equations, 13 figures, 6 tables.

Key Result

lemma thmcounterlemma

First, for $x \in \mathbb{R}^{d_0}$, for all $1 \leq i \leq r$, the network $F^{(i)}$ is affine on $\mathcal{P}^{(i)}_x$, meaning that there exists $\Gamma_x^{(i)} \in \mathbb{R}^{d_{i}\times d_0}$, $\gamma^{(i)}_x\in \mathbb{R}^{d_{i}}$ such that $\forall x' \in \mathcal{P}^{(i)}_x$, we have $F^{(i Second, it follows that if $x$ is not a critical point, there exists $\epsilon > 0$ such that for a

Figures (13)

  • Figure 1: Input space of a network where neurons on the previous layer are labelled in grey on their active side. In red is the neuron we aim to find. The critical points on the left and right yield respectively $(\Lambda a, \Lambda b, 0, 0, \Lambda e)$ and $(\lambda a, \lambda b, 0, \lambda d, 0)$. We can infer $(a,b,0,d,e)$ up to a constant, even though no single polytope activates $\eta_1,\eta_2,\eta_4,\eta_5$ simultaneously.
  • Figure 2: Left: Original signature extraction from C:CarJagMir20. Right: Proposed improvements. Two error-inducing steps in the original attack are coloured on the left. Improvements match the colour of the step they address. Further precision improvements for signature extraction are marked on the right.
  • Figure 3: Gradual improvement of signature extraction results on layer 4 of Model II with $3,000$ critical points (due to space constraints, only the largest 16 components are displayed). Each component on the target layer is labelled with its associated neuron on top. (a) Original signature extraction from C:CarJagMir20. (b) After discarding deeper critical points (see Section \ref{['subsec:filteralgo']}), deeper components either disappear or have a smaller size. (c) After intersecting critical points with insufficient ranks (see Section \ref{['rankissue']}), components with rank deficiency disappear. (d) After discarding deeper components (see Section \ref{['subsec:filteralgo']}), all remaining components are in the target layer. The only unrecovered component corresponds to an always-off neuron $\eta^{(4)}_5$.
  • Figure 4: Solving for a neuron’s weights from certain critical points can yield an underdetermined system. The target neuron is shown in black; active neurons are in blue. At input $x$, $\operatorname{rank}(\Gamma^{(1)}_x)=2$, so $\operatorname{rank}(\Gamma^{(2)}_x)\le 2$ even though three neurons are active in layer $2$; consequently, the layer-2 system has no unique solution.
  • Figure 5: Identifying if $x$ is on the target layer. $0\leq j<k$. By finding that the intersection point $a_2$ is not on the extracted hyperplane, we infer that the hyperplane we extracted from $x$ (- - -) broke on a layer $i+j$ we did not extract (---). Thus, $x$ cannot be on layer $i$. Yet, finding that $a_1$ is on the extracted hyperplane does not give any information about $x$'s layer.
  • ...and 8 more figures

Theorems & Definitions (14)

  • definition thmcounterdefinition: $r$-deep neural network
  • definition thmcounterdefinition: ReLU neural network
  • definition thmcounterdefinition: critical point
  • definition thmcounterdefinition: activation pattern
  • definition thmcounterdefinition: polytope
  • lemma thmcounterlemma: local affine network
  • definition thmcounterdefinition: functionally equivalent extraction
  • lemma thmcounterlemma: C:CarJagMir20
  • proof
  • definition thmcounterdefinition: effective architecture
  • ...and 4 more