RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features

Jialei Song; Xingquan Zuo; Feiyang Wang; Hai Huang; Tianle Zhang

RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features

Jialei Song, Xingquan Zuo, Feiyang Wang, Hai Huang, Tianle Zhang

TL;DR

The paper tackles the challenge of evaluating adversarial robustness in DNNs without relying on costly adversarial attacks. It introduces Robustness Difference Index (RDI), an attack-independent metric built from embedding-space statistics that combines intra-class compactness and inter-class separation via $IntraD$ and $InterD$ to yield $RDI = \frac{InterD - IntraD}{\max(InterD, IntraD)}$ which lies in $[-1,1]$. Through extensive experiments on image and speech tasks, RDI demonstrates strong correlation with the gold standard ASR, outperforms ROBY in accuracy and stability, and achieves substantially lower computation times (about 1/30 of PGD on average). The approach is scalable across datasets with varying class counts and modalities, offering practical utility for model selection, robustness assessment, and potential integration into training pipelines to bolster defense against adversarial perturbations.

Abstract

Deep neural networks (DNNs) are highly susceptible to adversarial samples, raising concerns about their reliability in safety-critical tasks. Currently, methods of evaluating adversarial robustness are primarily categorized into attack-based and certified robustness evaluation approaches. The former not only relies on specific attack algorithms but also is highly time-consuming, while the latter due to its analytical nature, is typically difficult to implement for large and complex models. A few studies evaluate model robustness based on the model's decision boundary, but they suffer from low evaluation accuracy. To address the aforementioned issues, we propose a novel adversarial robustness evaluation metric, Robustness Difference Index (RDI), which is based on model statistical features. RDI draws inspiration from clustering evaluation by analyzing the intra-class and inter-class distances of feature vectors separated by the decision boundary to quantify model robustness. It is attack-independent and has high computational efficiency. Experiments show that, RDI demonstrates a stronger correlation with the gold-standard adversarial robustness metric of attack success rate (ASR). The average computation time of RDI is only 1/30 of the evaluation method based on the PGD attack. Our open-source code is available at: https://github.com/BUPTAIOC/RDI.

RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features

TL;DR

Abstract

RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)