PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks
Huzaifa Arif, Keerthiram Murugesan, Payel Das, Alex Gittens, Pin-Yu Chen
TL;DR
PEEL reveals a pronounced inference-time data leakage risk in residual networks by performing layer-wise backward feature inversion, reconstructing private inputs from intermediate residual outputs without relying on training data priors. The method solves a non-convex optimization per residual block and progressively inverts the network, achieving high-fidelity recoveries on facial, chest X-ray, and ImageNet data, often outperforming GAN-based baselines in exactness (lower MSE and KNN Distance). The findings highlight privacy concerns for HbC scenarios, split learning, and cached intermediate representations, suggesting the need for robust privacy-preserving mechanisms when using residual architectures. The work also demonstrates PEEL’s applicability across various architectures and datasets, while outlining factors that modulate leakage, such as pooling, activation choices, and weight initialization.
Abstract
This paper explores inference-time data leakage risks of deep neural networks (NNs), where a curious and honest model service provider is interested in retrieving users' private data inputs solely based on the model inference results. Particularly, we revisit residual NNs due to their popularity in computer vision and our hypothesis that residual blocks are a primary cause of data leakage owing to the use of skip connections. By formulating inference-time data leakage as a constrained optimization problem, we propose a novel backward feature inversion method, \textbf{PEEL}, which can effectively recover block-wise input features from the intermediate output of residual NNs. The surprising results in high-quality input data recovery can be explained by the intuition that the output from these residual blocks can be considered as a noisy version of the input and thus the output retains sufficient information for input recovery. We demonstrate the effectiveness of our layer-by-layer feature inversion method on facial image datasets and pre-trained classifiers. Our results show that PEEL outperforms the state-of-the-art recovery methods by an order of magnitude when evaluated by mean squared error (MSE). The code is available at \href{https://github.com/Huzaifa-Arif/PEEL}{https://github.com/Huzaifa-Arif/PEEL}
