TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE
Tong Sun, Bowen Jiang, Hailong Lin, Borui Li, Yixiao Teng, Yi Gao, Wei Dong
TL;DR
TensorShield tackles the security vulnerabilities of on-device DNN inference by shielding a carefully selected subset of tensors rather than entire models, addressing MS and MIA in TEEs with minimized latency. It introduces an XAI-driven critical-tensor identification method using an attention-transition metric, and a critical-feature approach based on JS-divergence to protect membership privacy. A latency-aware placement framework jointly optimizes execution location (TEE/REE across CPU/GPU) and selective masking to reduce overhead while preserving security. Evaluations on four models and four datasets across two devices show up to 25.35x speedups with security comparable to full-shield baselines and substantial energy savings, making practical secure on-device inference feasible.
Abstract
To safeguard user data privacy, on-device inference has emerged as a prominent paradigm on mobile and Internet of Things (IoT) devices. This paradigm involves deploying a model provided by a third party on local devices to perform inference tasks. However, it exposes the private model to two primary security threats: model stealing (MS) and membership inference attacks (MIA). To mitigate these risks, existing wisdom deploys models within Trusted Execution Environments (TEEs), which is a secure isolated execution space. Nonetheless, the constrained secure memory capacity in TEEs makes it challenging to achieve full model security with low inference latency. This paper fills the gap with TensorShield, the first efficient on-device inference work that shields partial tensors of the model while still fully defending against MS and MIA. The key enabling techniques in TensorShield include: (i) a novel eXplainable AI (XAI) technique exploits the model's attention transition to assess critical tensors and shields them in TEE to achieve secure inference, and (ii) two meticulous designs with critical feature identification and latency-aware placement to accelerate inference while maintaining security. Extensive evaluations show that TensorShield delivers almost the same security protection as shielding the entire model inside TEE, while being up to 25.35$\times$ (avg. 5.85$\times$) faster than the state-of-the-art work, without accuracy loss.
