Steganalysis of AI Models LSB Attacks
Daniel Gilkarov, Ran Dubin
TL;DR
This work addresses the security risks of sharing pre-trained neural networks by introducing the first steganalysis framework for detecting malicious LSB embeddings in neural weights. It develops three detection approaches that combine supervised and unsupervised learning, using two main feature families: Reconstruction Loss and Backpropagation-derived signals. The study systematically constructs attacked neural-network zoos, evaluates detection across multiple datasets, and shows detection is highly effective when attackers modify the most significant LSBs but substantially harder for LSB-only edits, highlighting practical defense gaps. By releasing open-source steganography and steganalysis tools, the paper provides a practical pathway to protect openly shared neural resources and informs future defense strategies against stealthy weight-based injections.
Abstract
Artificial intelligence has made significant progress in the last decade, leading to a rise in the popularity of model sharing. The model zoo ecosystem, a repository of pre-trained AI models, has advanced the AI open-source community and opened new avenues for cyber risks. Malicious attackers can exploit shared models to launch cyber-attacks. This work focuses on the steganalysis of injected malicious Least Significant Bit (LSB) steganography into AI models, and it is the first work focusing on AI model attacks. In response to this threat, this paper presents a steganalysis method specifically tailored to detect and mitigate malicious LSB steganography attacks based on supervised and unsupervised AI detection steganalysis methods. Our proposed technique aims to preserve the integrity of shared models, protect user trust, and maintain the momentum of open collaboration within the AI community. In this work, we propose 3 steganalysis methods and open source our code. We found that the success of the steganalysis depends on the LSB attack location. If the attacker decides to exploit the least significant bits in the LSB, the ability to detect the attacks is low. However, if the attack is in the most significant LSB bits, the attack can be detected with almost perfect accuracy.
