Table of Contents
Fetching ...

Steganalysis of AI Models LSB Attacks

Daniel Gilkarov, Ran Dubin

TL;DR

This work addresses the security risks of sharing pre-trained neural networks by introducing the first steganalysis framework for detecting malicious LSB embeddings in neural weights. It develops three detection approaches that combine supervised and unsupervised learning, using two main feature families: Reconstruction Loss and Backpropagation-derived signals. The study systematically constructs attacked neural-network zoos, evaluates detection across multiple datasets, and shows detection is highly effective when attackers modify the most significant LSBs but substantially harder for LSB-only edits, highlighting practical defense gaps. By releasing open-source steganography and steganalysis tools, the paper provides a practical pathway to protect openly shared neural resources and informs future defense strategies against stealthy weight-based injections.

Abstract

Artificial intelligence has made significant progress in the last decade, leading to a rise in the popularity of model sharing. The model zoo ecosystem, a repository of pre-trained AI models, has advanced the AI open-source community and opened new avenues for cyber risks. Malicious attackers can exploit shared models to launch cyber-attacks. This work focuses on the steganalysis of injected malicious Least Significant Bit (LSB) steganography into AI models, and it is the first work focusing on AI model attacks. In response to this threat, this paper presents a steganalysis method specifically tailored to detect and mitigate malicious LSB steganography attacks based on supervised and unsupervised AI detection steganalysis methods. Our proposed technique aims to preserve the integrity of shared models, protect user trust, and maintain the momentum of open collaboration within the AI community. In this work, we propose 3 steganalysis methods and open source our code. We found that the success of the steganalysis depends on the LSB attack location. If the attacker decides to exploit the least significant bits in the LSB, the ability to detect the attacks is low. However, if the attack is in the most significant LSB bits, the attack can be detected with almost perfect accuracy.

Steganalysis of AI Models LSB Attacks

TL;DR

This work addresses the security risks of sharing pre-trained neural networks by introducing the first steganalysis framework for detecting malicious LSB embeddings in neural weights. It develops three detection approaches that combine supervised and unsupervised learning, using two main feature families: Reconstruction Loss and Backpropagation-derived signals. The study systematically constructs attacked neural-network zoos, evaluates detection across multiple datasets, and shows detection is highly effective when attackers modify the most significant LSBs but substantially harder for LSB-only edits, highlighting practical defense gaps. By releasing open-source steganography and steganalysis tools, the paper provides a practical pathway to protect openly shared neural resources and informs future defense strategies against stealthy weight-based injections.

Abstract

Artificial intelligence has made significant progress in the last decade, leading to a rise in the popularity of model sharing. The model zoo ecosystem, a repository of pre-trained AI models, has advanced the AI open-source community and opened new avenues for cyber risks. Malicious attackers can exploit shared models to launch cyber-attacks. This work focuses on the steganalysis of injected malicious Least Significant Bit (LSB) steganography into AI models, and it is the first work focusing on AI model attacks. In response to this threat, this paper presents a steganalysis method specifically tailored to detect and mitigate malicious LSB steganography attacks based on supervised and unsupervised AI detection steganalysis methods. Our proposed technique aims to preserve the integrity of shared models, protect user trust, and maintain the momentum of open collaboration within the AI community. In this work, we propose 3 steganalysis methods and open source our code. We found that the success of the steganalysis depends on the LSB attack location. If the attacker decides to exploit the least significant bits in the LSB, the ability to detect the attacks is low. However, if the attack is in the most significant LSB bits, the attack can be detected with almost perfect accuracy.
Paper Structure (27 sections, 13 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 13 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of model LSB steganalysis architecture. A model zoo (MZ.1) and a model zoo for which every model in (MZ.1) was attacked with X-LSB-Attack (MZ.2) go through feature extraction - for every feature (F.1), (F.2), and so on... a dataset with the feature extracted from every model (MZ.1) and (MZ.2) is created (1 feature per model). Section \ref{['sec:lsb_attack']} details X-LSB-Attack, section \ref{['sec:features']} details the features and how they are extracted, and section \ref{['sec:dataset']} describes the dataset creation process.
  • Figure 2: Illustration of float32 structure and effect of an LSB vs. an MSB. Looking at (II) as a base, in (I), the 2nd MSB is changed and colored yellow. In (III), the 1st LSB is changed and colored yellow. The change between (III) and (II) is very small - about $2\times10e-7$. The change between (I) and (II) is very big - about $3.6893488e+19$. This is why the most significant bits are called that - they affect the float value the most and vice versa for the least significant bits. See section \ref{['sec:float32']} for an explanation of the composition of float32.
  • Figure 3: Illustration of X-LSB attack - The weights (II) from a cover model (I) are extracted and stacked to form a column vector. Every float is transformed into a binary representation (III), an integer $1 \le X \le 23$ (IV), and a binary string (V) which are inputted into X-LSB-Attack. The attack embeds (V) into XLSB of the weights in (III) starting from the first weight. The resulting output is (VI), a binary matrix like (III) where (V) is embedded in the XLSB regions. (VII) is a float column vector constructed from each row of (VI), and then a model with the same architecture of (I) is initialized with the weights in (VII) - this is the resulting "attacked" model. See section \ref{['sec:lsb_attack']} for an in-depth explanation of the X-LSB-Attack depicted in this figure.
  • Figure 4: Illustration of how the model weights change after embedding a binary malware string using X-LSB-Attack-Fill. In this example, the cover model has six weights - $w_1, ..., w_6$. The 1-LSB and 2-LSB regions of the weights are colored blue and green, respectively, and the malware we embed is colored red. Using X-LSB-Attack-Fill, the malware string is embedded repeatedly, with the excess (gray) discarded. The resulting 1-LSB and 2-LSB regions with the malware embedded in them are bits that changed in red color and bits that remained the same after the embedding in green. In the 1-LSB case, 3/6 bits remained unchanged after embedding; in the 2-LSB case, 6/12 bits remained unchanged. This illustrates the fact that when malware is embedded into the model parameters, it doesn't necessarily cause a change in the float values, and this is the main challenge for steganalysis since the aim is usually to find outliers by looking at the float values.
  • Figure :
  • ...and 3 more figures