Table of Contents
Fetching ...

What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs

Zhihan Ren, Lijun He, Jiaxi Liang, Xinzhu Fu, Haixia Bi, Fan Li

TL;DR

Split DNNs expose an actionable privacy risk via intermediate features $f = M(x)$. FIA-Flow provides a data-efficient, black-box FIA by decoupling inversion into alignment and refinement: a Latent Feature Space Alignment Module maps $f$ into a latent code $z_s$ aligned with the VAE latent $z_x = Enc(x)$, followed by Deterministic Inversion Flow Matching that learns a vector field to steer to $z_x$, yielding $x' = Dec(\,\hat{z}_x\,)$. The method trains in two stages with losses that enforce feature-latent alignment and distributional refinement, enabling one-step reconstruction with fewer than $4{,}096$ image-feature pairs. Experiments show FIA-Flow achieves state-of-the-art fidelity and semantic leakage across diverse architectures (AlexNet, ResNet, Swin Transformer, DINO, YOLO11n) and layers, and generalizes to COCO via a dataset-agnostic alignment, underscoring a practical and severe privacy threat in Split DNNs.

Abstract

Split DNNs enable edge devices by offloading intensive computation to a cloud server, but this paradigm exposes privacy vulnerabilities, as the intermediate features can be exploited to reconstruct the private inputs via Feature Inversion Attack (FIA). Existing FIA methods often produce limited reconstruction quality, making it difficult to assess the true extent of privacy leakage. To reveal the privacy risk of the leaked features, we introduce FIA-Flow, a black-box FIA framework that achieves high-fidelity image reconstruction from intermediate features. To exploit the semantic information within intermediate features, we design a Latent Feature Space Alignment Module (LFSAM) to bridge the semantic gap between the intermediate feature space and the latent space. Furthermore, to rectify distributional mismatch, we develop Deterministic Inversion Flow Matching (DIFM), which projects off-manifold features onto the target manifold with one-step inference. This decoupled design simplifies learning and enables effective training with few image-feature pairs. To quantify privacy leakage from a human perspective, we also propose two metrics based on a large vision-language model. Experiments show that FIA-Flow achieves more faithful and semantically aligned feature inversion across various models (AlexNet, ResNet, Swin Transformer, DINO, and YOLO11) and layers, revealing a more severe privacy threat in Split DNNs than previously recognized.

What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs

TL;DR

Split DNNs expose an actionable privacy risk via intermediate features . FIA-Flow provides a data-efficient, black-box FIA by decoupling inversion into alignment and refinement: a Latent Feature Space Alignment Module maps into a latent code aligned with the VAE latent , followed by Deterministic Inversion Flow Matching that learns a vector field to steer to , yielding . The method trains in two stages with losses that enforce feature-latent alignment and distributional refinement, enabling one-step reconstruction with fewer than image-feature pairs. Experiments show FIA-Flow achieves state-of-the-art fidelity and semantic leakage across diverse architectures (AlexNet, ResNet, Swin Transformer, DINO, YOLO11n) and layers, and generalizes to COCO via a dataset-agnostic alignment, underscoring a practical and severe privacy threat in Split DNNs.

Abstract

Split DNNs enable edge devices by offloading intensive computation to a cloud server, but this paradigm exposes privacy vulnerabilities, as the intermediate features can be exploited to reconstruct the private inputs via Feature Inversion Attack (FIA). Existing FIA methods often produce limited reconstruction quality, making it difficult to assess the true extent of privacy leakage. To reveal the privacy risk of the leaked features, we introduce FIA-Flow, a black-box FIA framework that achieves high-fidelity image reconstruction from intermediate features. To exploit the semantic information within intermediate features, we design a Latent Feature Space Alignment Module (LFSAM) to bridge the semantic gap between the intermediate feature space and the latent space. Furthermore, to rectify distributional mismatch, we develop Deterministic Inversion Flow Matching (DIFM), which projects off-manifold features onto the target manifold with one-step inference. This decoupled design simplifies learning and enables effective training with few image-feature pairs. To quantify privacy leakage from a human perspective, we also propose two metrics based on a large vision-language model. Experiments show that FIA-Flow achieves more faithful and semantically aligned feature inversion across various models (AlexNet, ResNet, Swin Transformer, DINO, and YOLO11) and layers, revealing a more severe privacy threat in Split DNNs than previously recognized.

Paper Structure

This paper contains 23 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: (a) The pipeline of Split DNNs, which exposes intermediate features and creates an attack surface. (b) Existing FIA methods achieve inversion via white-box, sample-specific iterative feature matching for each input. (c) In contrast, FIA-Flow is trained once on a proxy dataset, learning to perform fast one-step inversion for any unseen input.
  • Figure 2: The pipeline of FIA-Flow. The method reconstructs a private image $x$ from the corresponding intermediate features $f$. It first maps $f$ to a latent code $z_s$ by the Latent Feature Space Alignment Module, then uses the Deterministic Inversion Flow Matching module to refine it into $\hat{z}_x$. Finally, the attack image $x'$ is obtained by a pre-trained VAE decoder from $\hat{z}_x$.
  • Figure 3: An illustration of LVLM-C and LVLM-PL evaluation. ① The LVLM is prompted to describe the original image. ② The LVLM is then prompted to describe the inversion image. ③ The LVLM compares these two descriptions to ascertain if the same object is identified. A consistent result yields the LVLM-C value of 1. ④ LVLM-PL is obtained by computing the BERTScore zhang2019bertscore between the two descriptions.
  • Figure 4: Visualization comparison of different FIA methods on various models.
  • Figure 5: Visualization comparison on different defense mechanisms. Top row: visualizations under the Noise+NoPeek defense titcombe2021practical. Bottom row: visualizations under the DISCO defense singh2021disco.
  • ...and 1 more figures