Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry
Birk Torpmann-Hagen, Michael A. Riegler, Pål Halvorsen, Dag Johansen
TL;DR
This paper addresses the security risk of stegomalware embedded in neural network checkpoints, which can deliver payloads with negligible impact on model accuracy. It introduces weight permutation, a defense that exploits permutation symmetry to shuffle weight and channel orders, corrupting embedded payloads while maintaining functional equivalence through forward hooks. Empirical results across ResNet architectures on CIFAR-10 show weight permutation outperforms pruning and retraining in erasing payloads, albeit with notable runtime overhead that can be mitigated by selective permutation. The work highlights the need for defense-in-depth in ML systems, discusses potential bypasses, and outlines future directions for permutation-invariant embeddings and broader steganalysis, thereby motivating continued security research for machine learning platforms.
Abstract
Deep neural networks are being utilized in a growing number of applications, both in production systems and for personal use. Network checkpoints are as a consequence often shared and distributed on various platforms to ease the development process. This work considers the threat of neural network stegomalware, where malware is embedded in neural network checkpoints at a negligible cost to network accuracy. This constitutes a significant security concern, but is nevertheless largely neglected by the deep learning practitioners and security specialists alike. We propose the first effective countermeasure to these attacks. In particular, we show that state-of-the-art neural network stegomalware can be efficiently and effectively neutralized through shuffling the column order of the weight- and bias-matrices, or equivalently the channel-order of convolutional layers. We show that this effectively corrupts payloads that have been embedded by state-of-the-art methods in neural network steganography at no cost to network accuracy, outperforming competing methods by a significant margin. We then discuss possible means by which to bypass this defense, additional defense methods, and advocate for continued research into the security of machine learning systems.
