Towards Precise Observations of Neural Model Robustness in Classification
Wenchuan Mu, Kwan Hui Lim
TL;DR
The paper tackles the need for precise, scalable robustness metrics in safety-critical neural classifiers, where traditional adversarial testing and verification can be costly or incomplete. It introduces an exact binomial test–based probabilistic robustness framework and integrates it into the TorchAttacks library, using the law of total probability to derive true probability bounds rather than relying on observed frequencies. Key contributions include (i) an exact binomial test solution for $P(z=1\mid h)$, (ii) a method to reduce degrees of freedom via total-probability reasoning, and (iii) standardized failure-rate thresholds aligned with IEC 61508 for practical certification. Empirical results on CIFAR-10 show that PRL achieves strong probabilistic robustness (e.g., a robustness score around $90.63\%$) while maintaining competitive accuracy, demonstrating a practical, certification-friendly approach to robustness evaluation in real-world safety-critical settings.
Abstract
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metrics that effectively capture the model's robustness are needed. To address this issue, we compare the rigour and usage conditions of various assessment methods based on different definitions. Then, we propose a straightforward and practical metric utilizing hypothesis testing for probabilistic robustness and have integrated it into the TorchAttacks library. Through a comparative analysis of diverse robustness assessment methods, our approach contributes to a deeper understanding of model robustness in safety-critical applications.
