Navigating the Deep: End-to-End Extraction on Deep Neural Networks
Haolin Liu, Adrien Siproudhis, Samuel Experton, Peter Lorenz, Christina Boura, Thomas Peyrin
TL;DR
This work tackles the problem of translating black-box neural-network outputs into full parameter recovery by advancing end-to-end model extraction for deep ReLU networks. It identifies core bottlenecks in prior signature and sign-extraction methods—particularly rank deficiency and deep-layer interference—and introduces subspace-intersection strategies, targeted filtering, and numerical-precision refinements to enable polynomial-time extraction across deeper architectures. A novel sign-extraction strategy (SOE+Wiggle) combines system-of-equations with high-confidence inactive neurons to robustly recover signs without exhaustive search, while precision normalization further stabilizes reconstruction in wide layers. Empirically, the approach achieves substantial depth in extraction, recovering eight-layer networks on MNIST and CIFAR-10 with high layer coverage and low relative error on a large fraction of the input space, outperforming prior methods that stalled at shallow depths. These results pose important security implications for deployed models and motivate further work on defenses against end-to-end, polynomial-time extraction in realistic attack settings.
Abstract
Neural network model extraction has recently emerged as an important security concern, as adversaries attempt to recover a network's parameters via black-box queries. Carlini et al. proposed in CRYPTO'20 a model extraction approach, consisting of two steps: signature extraction and sign extraction. However, in practice this signature-extraction method is limited to very shallow networks only, and the proposed sign-extraction method is exponential in time. Recently, Canales-Martinez et al. (Eurocrypt'24) proposed a polynomial-time sign-extraction method, but it assumes the corresponding signatures have already been successfully extracted and can fail on so-called low-confidence neurons. In this work, we first revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.'s signature-extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues. Our approach permits the extraction of much deeper networks than previously possible. In addition, we propose new methods to improve numerical precision in signature extraction, and enhance the sign extraction part by combining two polynomial methods to avoid exponential exhaustive search in the case of low-confidence neurons. This leads to the very first end-to-end model extraction method that runs in polynomial time. We validate our attack through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth. For instance, our attack extracts consistently at least eight layers of neural networks trained on either the MNIST or CIFAR-10 datasets, while previous works could barely extract the first three layers of networks of similar width.
