Investigating White-Box Attacks for On-Device Models

Mingyi Zhou; Xiang Gao; Jing Wu; Kui Liu; Hailong Sun; Li Li

Investigating White-Box Attacks for On-Device Models

Mingyi Zhou, Xiang Gao, Jing Wu, Kui Liu, Hailong Sun, Li Li

TL;DR

This paper addresses the security risk of on-device DL models by showing that compiled TFLite models can be reverse engineered into debuggable equivalents, enabling direct white-box attacks. The proposed REOM framework automates the transformation through four steps (Extractor, tf2onnx, Modifier, onnx2pytorch) and includes three modifiers (Pruning, Translation, Auto-matching) to handle structure, operator, and customization mismatches, with a key weight-dequantization transformation. Empirical results show REOM converts over 90% of a real-world set of 244 TFLite models into debuggable PyTorch models with minimal output deviation, and that these debuggable models enable much stronger white-box attacks than prior surrogate-based approaches. This work highlights a significant risk in on-device deployments and motivates new defense strategies to mitigate reverse-engineering-based vulnerabilities for mobile AI systems.

Abstract

Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models.

Investigating White-Box Attacks for On-Device Models

TL;DR

Abstract

Investigating White-Box Attacks for On-Device Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)