Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

Gaojie Jin; Tianjin Huang; Ronghui Mu; Xiaowei Huang

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

Gaojie Jin, Tianjin Huang, Ronghui Mu, Xiaowei Huang

TL;DR

This work addresses the problem of unequal worst-class certified robustness in smoothed classifiers. It introduces a PAC-Bayesian bound for the worst-class error and shows that the largest eigenvalue $\lambda_{\max}$ of the smoothed confusion matrix governs worst-class performance. A two-step principal eigenvalue regularization is proposed, combining an $\text{SVD}$-based gradient for $\lambda_{\max}$ with a differentiable surrogate confusion matrix using KL divergence to enable end-to-end training under smoothing. Empirical results on CIFAR-10 and Tiny-ImageNet show significant improvements in worst-class certified robustness and more uniform class performance, without sacrificing overall accuracy. Overall, the paper provides a principled framework and practical tools to enhance fairness in certified robustness for smoothed classifiers.

Abstract

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

TL;DR

This work addresses the problem of unequal worst-class certified robustness in smoothed classifiers. It introduces a PAC-Bayesian bound for the worst-class error and shows that the largest eigenvalue

of the smoothed confusion matrix governs worst-class performance. A two-step principal eigenvalue regularization is proposed, combining an

-based gradient for

with a differentiable surrogate confusion matrix using KL divergence to enable end-to-end training under smoothing. Empirical results on CIFAR-10 and Tiny-ImageNet show significant improvements in worst-class certified robustness and more uniform class performance, without sacrificing overall accuracy. Overall, the paper provides a principled framework and practical tools to enhance fairness in certified robustness for smoothed classifiers.

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

TL;DR

Abstract

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)