On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks
Seongjin Park, Haedong Jeong, Tair Djanibekov, Giyoung Jeon, Jinseok Seol, Jaesik Choi
TL;DR
The paper addresses the gap between generalization and adversarial robustness by examining the geometry of decision regions inside DNNs. It introduces Populated Region Set ($PRS$) to capture which decision regions are actually used by training data and demonstrates that a low PRS ratio correlates with greater robustness to gradient-based attacks. It further develops a Major Region ($MR$) framework and a MRV-based PRS regularizer that improve robust accuracy without relying on adversarial examples, validated across multiple architectures and datasets. The work provides a practical, geometry-driven mechanism to enhance robustness and offers theoretical insights on batch size and label smoothing effects on PRS, with potential for guiding robust training strategies.
Abstract
In general, Deep Neural Networks (DNNs) are evaluated by the generalization performance measured on unseen data excluded from the training phase. Along with the development of DNNs, the generalization performance converges to the state-of-the-art and it becomes difficult to evaluate DNNs solely based on this metric. The robustness against adversarial attack has been used as an additional metric to evaluate DNNs by measuring their vulnerability. However, few studies have been performed to analyze the adversarial robustness in terms of the geometry in DNNs. In this work, we perform an empirical study to analyze the internal properties of DNNs that affect model robustness under adversarial attacks. In particular, we propose the novel concept of the Populated Region Set (PRS), where training samples are populated more frequently, to represent the internal properties of DNNs in a practical setting. From systematic experiments with the proposed concept, we provide empirical evidence to validate that a low PRS ratio has a strong relationship with the adversarial robustness of DNNs. We also devise PRS regularizer leveraging the characteristics of PRS to improve the adversarial robustness without adversarial training.
