Table of Contents
Fetching ...

Stealthy Backdoor Attack to Real-world Models in Android Apps

Jiali Wei, Ming Fan, Xicheng Zhang, Wenjing Jiao, Haijun Wang, Ting Liu

TL;DR

This work addresses the security threat of backdoor attacks on real-world on-device DL models embedded in Android apps. It introduces BARWM, a steganography-based backdoor method that preserves the original model structure while generating sample-specific, imperceptible triggers via a trigger generator trained with StegaStamp. The authors reconstruct trainable equivalents of extracted on-device models, enabling backdoor retraining without modifying architecture, and demonstrate superior attack effectiveness and stealthiness compared with baselines like DeepPayload, BadNets, and Invisible Attack across multiple real-world models and datasets. The study analyzes extensive real-world data from 38,387 apps (89 models) to show BARWM’s robustness, achieving higher ASR with manageable or improved benign accuracy and substantially better stealth metrics (PSNR/MS-SSIM). Overall, BARWM reveals a heightened real-world threat to on-device DL, underscoring the need for stronger protections such as encryption, authentication, and code/weight integrity checks.

Abstract

Powered by their superior performance, deep neural networks (DNNs) have found widespread applications across various domains. Many deep learning (DL) models are now embedded in mobile apps, making them more accessible to end users through on-device DL. However, deploying on-device DL to users' smartphones simultaneously introduces several security threats. One primary threat is backdoor attacks. Extensive research has explored backdoor attacks for several years and has proposed numerous attack approaches. However, few studies have investigated backdoor attacks on DL models deployed in the real world, or they have shown obvious deficiencies in effectiveness and stealthiness. In this work, we explore more effective and stealthy backdoor attacks on real-world DL models extracted from mobile apps. Our main justification is that imperceptible and sample-specific backdoor triggers generated by DNN-based steganography can enhance the efficacy of backdoor attacks on real-world models. We first confirm the effectiveness of steganography-based backdoor attacks on four state-of-the-art DNN models. Subsequently, we systematically evaluate and analyze the stealthiness of the attacks to ensure they are difficult to perceive. Finally, we implement the backdoor attacks on real-world models and compare our approach with three baseline methods. We collect 38,387 mobile apps, extract 89 DL models from them, and analyze these models to obtain the prerequisite model information for the attacks. After identifying the target models, our approach achieves an average of 12.50% higher attack success rate than DeepPayload while better maintaining the normal performance of the models. Extensive experimental results demonstrate that our method enables more effective, robust, and stealthy backdoor attacks on real-world models.

Stealthy Backdoor Attack to Real-world Models in Android Apps

TL;DR

This work addresses the security threat of backdoor attacks on real-world on-device DL models embedded in Android apps. It introduces BARWM, a steganography-based backdoor method that preserves the original model structure while generating sample-specific, imperceptible triggers via a trigger generator trained with StegaStamp. The authors reconstruct trainable equivalents of extracted on-device models, enabling backdoor retraining without modifying architecture, and demonstrate superior attack effectiveness and stealthiness compared with baselines like DeepPayload, BadNets, and Invisible Attack across multiple real-world models and datasets. The study analyzes extensive real-world data from 38,387 apps (89 models) to show BARWM’s robustness, achieving higher ASR with manageable or improved benign accuracy and substantially better stealth metrics (PSNR/MS-SSIM). Overall, BARWM reveals a heightened real-world threat to on-device DL, underscoring the need for stronger protections such as encryption, authentication, and code/weight integrity checks.

Abstract

Powered by their superior performance, deep neural networks (DNNs) have found widespread applications across various domains. Many deep learning (DL) models are now embedded in mobile apps, making them more accessible to end users through on-device DL. However, deploying on-device DL to users' smartphones simultaneously introduces several security threats. One primary threat is backdoor attacks. Extensive research has explored backdoor attacks for several years and has proposed numerous attack approaches. However, few studies have investigated backdoor attacks on DL models deployed in the real world, or they have shown obvious deficiencies in effectiveness and stealthiness. In this work, we explore more effective and stealthy backdoor attacks on real-world DL models extracted from mobile apps. Our main justification is that imperceptible and sample-specific backdoor triggers generated by DNN-based steganography can enhance the efficacy of backdoor attacks on real-world models. We first confirm the effectiveness of steganography-based backdoor attacks on four state-of-the-art DNN models. Subsequently, we systematically evaluate and analyze the stealthiness of the attacks to ensure they are difficult to perceive. Finally, we implement the backdoor attacks on real-world models and compare our approach with three baseline methods. We collect 38,387 mobile apps, extract 89 DL models from them, and analyze these models to obtain the prerequisite model information for the attacks. After identifying the target models, our approach achieves an average of 12.50% higher attack success rate than DeepPayload while better maintaining the normal performance of the models. Extensive experimental results demonstrate that our method enables more effective, robust, and stealthy backdoor attacks on real-world models.
Paper Structure (36 sections, 5 equations, 4 figures, 5 tables)

This paper contains 36 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The normal TFLite model and the TFLite model after being attacked by DeepPayload. The additional modules significantly alter the model structure and severely ignore the requirement for stealthiness of the attack.
  • Figure 2: The overview architecture of BARWM, which contains three procedures, i.e., on-device model extraction and analysis (Section \ref{['Extracting On-Device Models']}), on-device model conversion (Section \ref{['Converting DL Models for Backdoor Attack']}), and steganography-based backdoor attack (Section \ref{['Sample-specific Backdoor Attack']}).
  • Figure 3: The training process of the backdoor trigger generator (an encoder-decoder network). The perceptual loss measures the perceptual difference between input images and encoded images. The cross entropy loss measures the difference between the original message and the decoded message. The training process is supervised by minimizing these two losses.
  • Figure 4: Backdoor samples and triggers generated by BadNets, Invisible Attack, DeepPayload, and our method. BadNets uses a white square in the lower-right corner as the trigger. Invisible Attack uses randomly generated subtle noise as the trigger, which is shown in the fourth column,"Noise-Trigger". DeepPayload uses the hand-written "T" as the trigger. BARWM uses a backdoor trigger generator $\mathbb{G}$ to generate triggers that are not only sample-specific but also invisible. Note that in the last column, we increased the pixel values of the triggers to visualize them. The correct labels from top to bottom are "chain saw", "holster", "rifle", and "projectile". After attacks, these backdoor samples are classified as "cellular telephone" by the corresponding backdoor model.