Exploring Adversarial Robustness of Deep State Space Models

Biqing Qi; Yang Luo; Junqi Gao; Pengfei Li; Kai Tian; Zhiyuan Ma; Bowen Zhou

Exploring Adversarial Robustness of Deep State Space Models

Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou

TL;DR

This work evaluates existing structural variants of SSMs with AT to assess their AR performance and proposes a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO.

Abstract

Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO. Our code is available at https://github.com/Biqing-Qi/Exploring-Adversarial-Robustness-of-Deep-State-Space-Models.git.

Exploring Adversarial Robustness of Deep State Space Models

TL;DR

Abstract

Paper Structure (21 sections, 1 theorem, 15 equations, 5 figures, 6 tables)

This paper contains 21 sections, 1 theorem, 15 equations, 5 figures, 6 tables.

Introduction
Preliminaries of SSMs
S4
DSS
S5
Mamba (S6)
Mega
Empirical Evaluation: Component Contributions to AT Gains
Component-wise Attribution: Theoretical and Experimental Analysis
Theoretical Analysis of SSM Stability Under APs
Experimental Validation and Further Insights
Conclusion
Acknowledgement
Mathematical Derivations
The Proof of Theorem \ref{['perturbationbounds']}
...and 6 more sections

Key Result

Theorem 4.1.1

Given the SSM formalized as in eq. (10), the output error before and after perturbation, $\mathbb{E}_\varepsilon\left[\left\|\bm y^{\prime}-\bm y\right\|^2\right]$, has the following upper and lower bounds: where $L$ denotes the length of the input sequence, $0 < c_1 \leq c_2$ are constants, and $\bar{\lambda}_{i}^{\max}$ and $\bar{\lambda}_{i}^{\min}$ are the eigenvalues of matrix $\overline{\bm

Figures (5)

Figure 1: The PGD-AT training process, testing process, and the adversarial PGD-10 testing process on the training and testing datasets on CIFAR-10 and MNIST datasets.
Figure 2: The TRADES training process, testing process, and the adversarial PGD-10 testing process on the training and testing datasets on CIFAR-10 and MNIST datasets.
Figure 3: Changes in KL divergence and MSE before and after different components in various SSM structures are presented. The change for each component is calculated as: after component - before component. The data represents the change rate, calculated as: change / before component. Blank sections indicate the absence of a corresponding component. Bars with diagonal hatching represent results on the test set, while bars without hatching represent results on the training set.
Figure 4: Changes in KL divergence and MSE before and after different components in the S4 and DSS with different AdS under PGD-AT and TRADES training on CIFAR-10. The change of the component is calculated as: after component - before component. The data in the figure represents the change rate which calculated as: change $/$ before component. Bars with diagonal hatching represent the results on the test set, while bars without hatching represent that on the training set.
Figure 5: The ST training process, testing process, and the adversarial PGD-10 testing process on the testing datasets on CIFAR-10 and MNIST datasets.

Theorems & Definitions (1)

Theorem 4.1.1

Exploring Adversarial Robustness of Deep State Space Models

TL;DR

Abstract

Exploring Adversarial Robustness of Deep State Space Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)