Table of Contents
Fetching ...

TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE

Tong Sun, Bowen Jiang, Hailong Lin, Borui Li, Yixiao Teng, Yi Gao, Wei Dong

TL;DR

TensorShield tackles the security vulnerabilities of on-device DNN inference by shielding a carefully selected subset of tensors rather than entire models, addressing MS and MIA in TEEs with minimized latency. It introduces an XAI-driven critical-tensor identification method using an attention-transition metric, and a critical-feature approach based on JS-divergence to protect membership privacy. A latency-aware placement framework jointly optimizes execution location (TEE/REE across CPU/GPU) and selective masking to reduce overhead while preserving security. Evaluations on four models and four datasets across two devices show up to 25.35x speedups with security comparable to full-shield baselines and substantial energy savings, making practical secure on-device inference feasible.

Abstract

To safeguard user data privacy, on-device inference has emerged as a prominent paradigm on mobile and Internet of Things (IoT) devices. This paradigm involves deploying a model provided by a third party on local devices to perform inference tasks. However, it exposes the private model to two primary security threats: model stealing (MS) and membership inference attacks (MIA). To mitigate these risks, existing wisdom deploys models within Trusted Execution Environments (TEEs), which is a secure isolated execution space. Nonetheless, the constrained secure memory capacity in TEEs makes it challenging to achieve full model security with low inference latency. This paper fills the gap with TensorShield, the first efficient on-device inference work that shields partial tensors of the model while still fully defending against MS and MIA. The key enabling techniques in TensorShield include: (i) a novel eXplainable AI (XAI) technique exploits the model's attention transition to assess critical tensors and shields them in TEE to achieve secure inference, and (ii) two meticulous designs with critical feature identification and latency-aware placement to accelerate inference while maintaining security. Extensive evaluations show that TensorShield delivers almost the same security protection as shielding the entire model inside TEE, while being up to 25.35$\times$ (avg. 5.85$\times$) faster than the state-of-the-art work, without accuracy loss.

TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE

TL;DR

TensorShield tackles the security vulnerabilities of on-device DNN inference by shielding a carefully selected subset of tensors rather than entire models, addressing MS and MIA in TEEs with minimized latency. It introduces an XAI-driven critical-tensor identification method using an attention-transition metric, and a critical-feature approach based on JS-divergence to protect membership privacy. A latency-aware placement framework jointly optimizes execution location (TEE/REE across CPU/GPU) and selective masking to reduce overhead while preserving security. Evaluations on four models and four datasets across two devices show up to 25.35x speedups with security comparable to full-shield baselines and substantial energy savings, making practical secure on-device inference feasible.

Abstract

To safeguard user data privacy, on-device inference has emerged as a prominent paradigm on mobile and Internet of Things (IoT) devices. This paradigm involves deploying a model provided by a third party on local devices to perform inference tasks. However, it exposes the private model to two primary security threats: model stealing (MS) and membership inference attacks (MIA). To mitigate these risks, existing wisdom deploys models within Trusted Execution Environments (TEEs), which is a secure isolated execution space. Nonetheless, the constrained secure memory capacity in TEEs makes it challenging to achieve full model security with low inference latency. This paper fills the gap with TensorShield, the first efficient on-device inference work that shields partial tensors of the model while still fully defending against MS and MIA. The key enabling techniques in TensorShield include: (i) a novel eXplainable AI (XAI) technique exploits the model's attention transition to assess critical tensors and shields them in TEE to achieve secure inference, and (ii) two meticulous designs with critical feature identification and latency-aware placement to accelerate inference while maintaining security. Extensive evaluations show that TensorShield delivers almost the same security protection as shielding the entire model inside TEE, while being up to 25.35 (avg. 5.85) faster than the state-of-the-art work, without accuracy loss.

Paper Structure

This paper contains 21 sections, 8 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Paradigms of interactions between user and private models. (a) Remote inference: users send data to the model provider. (b) Directly on-device inference: model providers deploy the private model in devices. (c) TEE-based secure on-device inference: model providers generate a secure deployment strategy (❶) and deploy the model with TEE (❷).
  • Figure 2: An illustration of previous work for model protection. The model has three layers with eight tensors. ① shields all (three) layers lee2019occlumency. ② shields one shallow layer (Layer1) elgamal2020serdab and ③ shields one deep layer (Layer3) mo2020darknetz. ④ shields one random intermediate layers (Layer2) shen2022soter. ⑤ shields large magnitude weights of each layer hou2021model. ⑥ shields non-linear layers and obfuscates all linear tensors (indicated by rectangles with dashed lines) and all intermediate features sun2023shadownetzhanggroupcover. TensorShield (Ours) shields critical tensors (indicated by blue rectangles) and only masks privacy-related intermediate features.
  • Figure 3: A three-stage attack pipeline.
  • Figure 4: Workflow of TensorShield.
  • Figure 5: An instance of attention transition. Heat maps represent the importance of classifying model decisions.
  • ...and 10 more figures