Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective

Jonghyun Park; Juyeop Kim; Jong-Seok Lee

Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective

Jonghyun Park, Juyeop Kim, Jong-Seok Lee

TL;DR

The paper investigates how regularization with soft labels (label smoothing, Mixup, CutMix) improves model calibration and adversarial robustness through the geometry of the representation space. It demonstrates that soft-label regularization reduces feature magnitudes (RMS) and increases alignment with class centers, effectively lowering predicted confidence while tightening the representation, which also directs adversarial perturbations toward the origin. The authors show that feature scaling behaves similarly to temperature scaling and that higher cosine similarity to class centers correlates with stronger robustness against gradient-based attacks. Across multiple architectures and datasets, these representation-space changes yield improved calibration and enhanced robustness, though the strongest AutoAttack configurations may require additional defenses.

Abstract

Recent studies have shown that regularization techniques using soft labels, e.g., label smoothing, Mixup, and CutMix, not only enhance image classification accuracy but also mitigate miscalibration due to overconfident predictions, and improve robustness against adversarial attacks. However, the underlying mechanisms of such improvements remain underexplored. In this paper, we offer a novel explanation from the perspective of the representation space (i.e., the space of the features obtained at the penultimate layer). Based on examination of decision boundaries and structure of features (or representation vectors), our study investigates confidence contours and gradient directions within the representation space. Furthermore, we analyze the adjustments in feature distributions due to regularization in relation to these contours and directions, from which we uncover central mechanisms inducing improved calibration and robustness. Our findings provide new insights into the characteristics of the high-dimensional representation space in relation to training and regularization using soft labels.

Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective

TL;DR

Abstract

Paper Structure (30 sections, 1 theorem, 13 equations, 22 figures, 6 tables)

This paper contains 30 sections, 1 theorem, 13 equations, 22 figures, 6 tables.

Introduction
Related Work
Representation Space
Decision Regions
Confidence Contours, Gradient Directions
Effect of Regularization
Measuring Feature Distributions
Impact on Feature Distributions
Impact on Calibration
Impact on Adversarial Robustness
Resolving the Contradiction
Comprehensive Evaluation
Conclusion
Discussion
Decision Regions of Original Models
...and 15 more sections

Key Result

Theorem 1

Assume $b_c \approx 0$ and $||\mathbf{w}_{c}|| \approx ||\mathbf{w}||$ for all classes $c$. Then, the optimal solution of training $\mathbf{f}$ with cross-entropy using hard labels, $\mathbf{f}^*_{hard}$, has a larger magnitude than that using soft labels, $\mathbf{f}^*_{soft}$, i.e.,

Figures (22)

Figure 1: Decision regions and feature distributions: (a) without bias in the classification layer, and (b) with bias. Without bias the regions and features are cone-shaped, radial for every class; with bias one class (purple) sits in the center with a circular region. Squares are weight vectors.
Figure 2: 2D representation space of ResNet50 on CIFAR-10. Each cross mark represents the location with the lowest loss for each class. (a) Confidence contours. (b) Loss and gradient directions. (c) Enlarged version of (b). (d) Features (circles), class means (triangles), and weight vectors (squares). A cross mark indicates the minimum loss point for each class.
Figure 3: Evaluation results of ResNet50 on the ImageNet validation data. Top. Scatter plots of feature RMS and cosine similarities of features ($\mathbf{v}_\text{rep}$) with the class center ($\mathbf{v}_\text{opt}$). Colors represent confidence values. Middle. Histograms of cosine similarities of features to class centers, along with the attack success rates of FGSM, PGD$^\text{w}$, and AutoAttack$^\text{w}$ for each bin (hyperparameters settings for the attacks can be found in Section \ref{['sec:4_4']}). For results on PGD$^\text{s}$ and AutoAttack$^\text{s}$, see Fig. \ref{['fig:stronger_settings1']} in Appendix \ref{['appendix:more_result_regularization']}. Bottom. Reliability diagrams, where the transparency of bars represents the ratio of data in each confidence bin. Expected calibration error (ECE) guo2017calibration values are shown for each case.
Figure 4: Features in the 2D representation space for ResNet50 on CIFAR-10. Note that the scales differ across figures. See Appendix \ref{['appendix:more_example_2d']} for further results.
Figure 5: Loss and gradient directions for a certain class in the 2D representation space of ResNet50 on CIFAR-10. White crosses indicate the locations with the smallest loss. Circles and triangles represent the features of clean and perturbed data, respectively. White lines depict the decision boundary.
...and 17 more figures

Theorems & Definitions (1)

Theorem 1

Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective

TL;DR

Abstract

Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (1)