Table of Contents
Fetching ...

On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks

Seongjin Park, Haedong Jeong, Tair Djanibekov, Giyoung Jeon, Jinseok Seol, Jaesik Choi

TL;DR

The paper addresses the gap between generalization and adversarial robustness by examining the geometry of decision regions inside DNNs. It introduces Populated Region Set ($PRS$) to capture which decision regions are actually used by training data and demonstrates that a low PRS ratio correlates with greater robustness to gradient-based attacks. It further develops a Major Region ($MR$) framework and a MRV-based PRS regularizer that improve robust accuracy without relying on adversarial examples, validated across multiple architectures and datasets. The work provides a practical, geometry-driven mechanism to enhance robustness and offers theoretical insights on batch size and label smoothing effects on PRS, with potential for guiding robust training strategies.

Abstract

In general, Deep Neural Networks (DNNs) are evaluated by the generalization performance measured on unseen data excluded from the training phase. Along with the development of DNNs, the generalization performance converges to the state-of-the-art and it becomes difficult to evaluate DNNs solely based on this metric. The robustness against adversarial attack has been used as an additional metric to evaluate DNNs by measuring their vulnerability. However, few studies have been performed to analyze the adversarial robustness in terms of the geometry in DNNs. In this work, we perform an empirical study to analyze the internal properties of DNNs that affect model robustness under adversarial attacks. In particular, we propose the novel concept of the Populated Region Set (PRS), where training samples are populated more frequently, to represent the internal properties of DNNs in a practical setting. From systematic experiments with the proposed concept, we provide empirical evidence to validate that a low PRS ratio has a strong relationship with the adversarial robustness of DNNs. We also devise PRS regularizer leveraging the characteristics of PRS to improve the adversarial robustness without adversarial training.

On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks

TL;DR

The paper addresses the gap between generalization and adversarial robustness by examining the geometry of decision regions inside DNNs. It introduces Populated Region Set () to capture which decision regions are actually used by training data and demonstrates that a low PRS ratio correlates with greater robustness to gradient-based attacks. It further develops a Major Region () framework and a MRV-based PRS regularizer that improve robust accuracy without relying on adversarial examples, validated across multiple architectures and datasets. The work provides a practical, geometry-driven mechanism to enhance robustness and offers theoretical insights on batch size and label smoothing effects on PRS, with potential for guiding robust training strategies.

Abstract

In general, Deep Neural Networks (DNNs) are evaluated by the generalization performance measured on unseen data excluded from the training phase. Along with the development of DNNs, the generalization performance converges to the state-of-the-art and it becomes difficult to evaluate DNNs solely based on this metric. The robustness against adversarial attack has been used as an additional metric to evaluate DNNs by measuring their vulnerability. However, few studies have been performed to analyze the adversarial robustness in terms of the geometry in DNNs. In this work, we perform an empirical study to analyze the internal properties of DNNs that affect model robustness under adversarial attacks. In particular, we propose the novel concept of the Populated Region Set (PRS), where training samples are populated more frequently, to represent the internal properties of DNNs in a practical setting. From systematic experiments with the proposed concept, we provide empirical evidence to validate that a low PRS ratio has a strong relationship with the adversarial robustness of DNNs. We also devise PRS regularizer leveraging the characteristics of PRS to improve the adversarial robustness without adversarial training.
Paper Structure (20 sections, 8 theorems, 23 equations, 10 figures, 2 tables)

This paper contains 20 sections, 8 theorems, 23 equations, 10 figures, 2 tables.

Key Result

Lemma 6.1

A sharp minima yields high parameter sensitivity.

Figures (10)

  • Figure 1: An illustrative comparison of each training method with CIFAR-10, and visualization for decision boundaries/regions (DBs/DRs) of penultimate layer in the input space ($f^{(L-1:1)}(x)$). For visualization, we randomly select three dog images and depict section of input space. The green area indicates DR which the blue boxed image populates. (A) Warm-up stage for VGG-16 with standard training (cross-entropy loss). (B) Standard training after warm up stage. (C) The robust learning with devised PRS regularizer after warm up stage. We identify that each training scheme induces different configuration of DBs/DRs, which represents different internal properties of DNNs.
  • Figure 2: Training/test accuracy and the PRS ratio on the penultimate layer on CNN-6 with batch size 2048 and 128. We select the networks at the $300^{\text{th}}$ epoch and denote these two CNN-6 by Network A and B, respectively, throughout the paper (PRS ratio of Network A: 0.99, and Network B: 0.007). We also should be scenario where the label smoothing is applied.
  • Figure 3: Robust accuracy under various adversarial attack methods on networks A and B. The x-axis indicates perturbation $\epsilon$ and the y-axis indicates the training/test robust accuracy.
  • Figure 4: Relationship between the PRS ratio and robust accuracy attacked by PGD method in various models and datasets. The colored dots are for the independent models. The colored dashed lines indicate the trend for each dataset.
  • Figure 5: (a) Comparison of the ratio of the zero gradient in the failure attack for the test samples under the PGD-20 attack on $L_{\infty}$ with $\epsilon = 0.0313$ (Network A and B). (b) The illustrative examples of attacked samples on Network A and B which is failed on B, and the corresponding logits before/after the attack. After the attack, the logits move on almost parallel direction with the original logits in Network B.
  • ...and 5 more figures

Theorems & Definitions (28)

  • Definition 3.1: Decision Boundary (DB)
  • Definition 3.2: Decision Region (DR)
  • Definition 3.3: Populated Region Set (PRS)
  • Definition 4.1: Major Region (MR)
  • Definition 4.2: Major Region Mean Vector (MRV)
  • Definition 6.1: Class Decision Boundary
  • Definition 6.2: Class Decision Region
  • Definition 6.3: Class Decision Distance
  • Definition 6.4: Margin Distance
  • Definition 6.5: Sharpness
  • ...and 18 more