Table of Contents
Fetching ...

Beyond Slow Signs in High-fidelity Model Extraction

Hanna Foerster, Robert Mullins, Ilia Shumailov, Jamie Hayes

TL;DR

A unified codebase that integrates previous methods and reveals that computational tools can significantly influence performance is introduced and new ways of robust benchmarking for future model extraction attacks are proposed.

Abstract

Deep neural networks, costly to train and rich in intellectual property value, are increasingly threatened by model extraction attacks that compromise their confidentiality. Previous attacks have succeeded in reverse-engineering model parameters up to a precision of float64 for models trained on random data with at most three hidden layers using cryptanalytical techniques. However, the process was identified to be very time consuming and not feasible for larger and deeper models trained on standard benchmarks. Our study evaluates the feasibility of parameter extraction methods of Carlini et al. [1] further enhanced by Canales-Martínez et al. [2] for models trained on standard benchmarks. We introduce a unified codebase that integrates previous methods and reveal that computational tools can significantly influence performance. We develop further optimisations to the end-to-end attack and improve the efficiency of extracting weight signs by up to 14.8 times compared to former methods through the identification of easier and harder to extract neurons. Contrary to prior assumptions, we identify extraction of weights, not extraction of weight signs, as the critical bottleneck. With our improvements, a 16,721 parameter model with 2 hidden layers trained on MNIST is extracted within only 98 minutes compared to at least 150 minutes previously. Finally, addressing methodological deficiencies observed in previous studies, we propose new ways of robust benchmarking for future model extraction attacks.

Beyond Slow Signs in High-fidelity Model Extraction

TL;DR

A unified codebase that integrates previous methods and reveals that computational tools can significantly influence performance is introduced and new ways of robust benchmarking for future model extraction attacks are proposed.

Abstract

Deep neural networks, costly to train and rich in intellectual property value, are increasingly threatened by model extraction attacks that compromise their confidentiality. Previous attacks have succeeded in reverse-engineering model parameters up to a precision of float64 for models trained on random data with at most three hidden layers using cryptanalytical techniques. However, the process was identified to be very time consuming and not feasible for larger and deeper models trained on standard benchmarks. Our study evaluates the feasibility of parameter extraction methods of Carlini et al. [1] further enhanced by Canales-Martínez et al. [2] for models trained on standard benchmarks. We introduce a unified codebase that integrates previous methods and reveal that computational tools can significantly influence performance. We develop further optimisations to the end-to-end attack and improve the efficiency of extracting weight signs by up to 14.8 times compared to former methods through the identification of easier and harder to extract neurons. Contrary to prior assumptions, we identify extraction of weights, not extraction of weight signs, as the critical bottleneck. With our improvements, a 16,721 parameter model with 2 hidden layers trained on MNIST is extracted within only 98 minutes compared to at least 150 minutes previously. Finally, addressing methodological deficiencies observed in previous studies, we propose new ways of robust benchmarking for future model extraction attacks.
Paper Structure (24 sections, 5 figures, 3 tables)

This paper contains 24 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a) Compares the running times for Carlini's signature extraction versus Carlini's sign extraction, Canales-Martinez (CM)'s sign extraction with $s=200$ setting in the original implementation and in the unified implementation and Our sign extraction with $s=15$ setting. The tests are across ten models with increasing layer sizes from $10-5-5-1$ to $100-50-50-1$, detailing times for a single layer's extraction in a non-parallelised setting. (b) Depicts how the average percentage of correctly recovered neurons in a layer changes when the number of sign extractions $s$ changes. Raising the number of sign extractions $s$ to more than 15 does not significantly raise the number of correctly recovered neurons. (c) Graph showing confidences of sign recovery when a hard neuron's euclidean distance to its neighbours is manipulated. These results are on hard to sign extract neurons 25 and 26 of an MNIST trained 784-32x8-1 model extracted with seed $42$. The confidence metric scales from $1$ to $0.5$ first on the confidence of false sign recovery, which is equivalent to $0$ to $0.5$ of confidence in true sign recovery and then from $0.5$ to $1$ on the confidence of true sign recovery, resulting in the scale going from $1-0.5-1$.
  • Figure 2: (a) Neuron and Critical Points before Precision Improvement (b) Neuron and Critical Points after Precision Improvement (c) An Example of When Precision Improvement Fails. Neuron $\eta_l$ is close to the critical point than $\eta_k$ and so this critical point is converted to a critical point for $\eta_l$ instead of for $\eta_k$.
  • Figure 3: (a)The change of accuracy to original model's predictions with sign flips of hard to sign extract neurons in layer 3 of a CIFAR model with 128 neurons. The order of sign flipping was iterated over all combinations of ordering the 5 neurons to produce the error bounds. (b) Percentage of correctly recovered neurons in MNIST and CIFAR models with layer sizes ranging from 4 to 256. (c) Depicts how the number of incorrectly recovered neurons rises as the accuracy gain of a model due to larger layer size diminishes.
  • Figure 4: Compares the query numbers for Carlini's signature extraction versus Canales-Martinez (CM)'s sign extraction with $s=200$ setting and Our sign extraction with $s=15$ setting across ten models with increasing layer sizes from $10-5-5-1$ to $100-50-50-1$, detailing query numbers for a single layer's extraction.
  • Figure 5: (a) Total signature recovery time and total sign recovery time of layer 2 of MNIST models with 2 hidden layers with layer sizes 4,8,16,32 and 64. The signature extraction was run with seeds 0,10,40 and 42. (b) Average number of queries for signature and sign recovery per neuron of layer 2 of MNIST models. These graphs do not include the precision improvement time or queries.