Table of Contents
Fetching ...

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

Gaojie Jin, Tianjin Huang, Ronghui Mu, Xiaowei Huang

TL;DR

This work addresses the problem of unequal worst-class certified robustness in smoothed classifiers. It introduces a PAC-Bayesian bound for the worst-class error and shows that the largest eigenvalue $\lambda_{\max}$ of the smoothed confusion matrix governs worst-class performance. A two-step principal eigenvalue regularization is proposed, combining an $\text{SVD}$-based gradient for $\lambda_{\max}$ with a differentiable surrogate confusion matrix using KL divergence to enable end-to-end training under smoothing. Empirical results on CIFAR-10 and Tiny-ImageNet show significant improvements in worst-class certified robustness and more uniform class performance, without sacrificing overall accuracy. Overall, the paper provides a principled framework and practical tools to enhance fairness in certified robustness for smoothed classifiers.

Abstract

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

TL;DR

This work addresses the problem of unequal worst-class certified robustness in smoothed classifiers. It introduces a PAC-Bayesian bound for the worst-class error and shows that the largest eigenvalue of the smoothed confusion matrix governs worst-class performance. A two-step principal eigenvalue regularization is proposed, combining an -based gradient for with a differentiable surrogate confusion matrix using KL divergence to enable end-to-end training under smoothing. Empirical results on CIFAR-10 and Tiny-ImageNet show significant improvements in worst-class certified robustness and more uniform class performance, without sacrificing overall accuracy. Overall, the paper provides a principled framework and practical tools to enhance fairness in certified robustness for smoothed classifiers.

Abstract

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.

Paper Structure

This paper contains 13 sections, 6 theorems, 43 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2.1

Consider a training dataset $\mathcal{S}$ with $m$ samples drawn from a distribution $\mathcal{D}$ on $\mathcal{X}_B \times \mathcal{Y}$ with $\mathcal{Y} = \{1, \ldots, d_y\}$. Given a learning algorithm (e.g., a classifier) with prior and posterior distributions $P$ and $Q$ (i.e., $\mathbf{w}+\ul$ where $m_{min}$ represents the minimal number of examples from $\mathcal{S}$ which belong to the sa

Figures (3)

  • Figure 1: Illustration of the development of Thm. \ref{['thm:main']}.
  • Figure 2: Simulation of $\mu$ under different number of classes. Each box (with dimension 10, 20, 50, and 100) is computed by $10000$ randomly generated confusion matrices.
  • Figure 3: The class-wise variation in certified accuracy at radius 0.12 for CIFAR-10, measured as standard deviation for smooth noise levels $\sigma\in\{0.12,0.25,0.50\}$ respectively.

Theorems & Definitions (9)

  • Theorem 2.1: morvant2012pac
  • Theorem 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Proof 1.1
  • Proof 1.2
  • Proof 1.3
  • Lemma 2.1: neyshabur2017pac
  • Theorem 3.1