Table of Contents
Fetching ...

Towards Class-wise Robustness Analysis

Tejaswini Medi, Julia Grabinski, Margret Keuper

TL;DR

The paper tackles the problem of class-wise robustness under domain shifts and adversarial threats by introducing the Class False Positive Score ($CFPS$) to quantify cross-class misclassification biases. It evaluates CFPS alongside robust accuracy across CIFAR-10 and CIFAR-10-C using multiple architectures and PGD attacks, revealing that per-class vulnerabilities do not always align with overall robustness. Key findings show high CFPS for class C4 (cat) and weak accuracy for classes like C3 (bird), C5 (deer), and C6 (dog), with these patterns persisting under corruptions and targeted attacks. The work highlights the practical impact of class-wise biases for designing defenses that address latent-space structure and cross-class confusion to improve real-world reliability.

Abstract

While being very successful in solving many downstream tasks, the application of deep neural networks is limited in real-life scenarios because of their susceptibility to domain shifts such as common corruptions, and adversarial attacks. The existence of adversarial examples and data corruption significantly reduces the performance of deep classification models. Researchers have made strides in developing robust neural architectures to bolster decisions of deep classifiers. However, most of these works rely on effective adversarial training methods, and predominantly focus on overall model robustness, disregarding class-wise differences in robustness, which are critical. Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models. Therefore, this study investigates class-to-class biases across adversarially trained robust classification models to understand their latent space structures and analyze their strong and weak class-wise properties. We further assess the robustness of classes against common corruptions and adversarial attacks, recognizing that class vulnerability extends beyond the number of correct classifications for a specific class. We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks. Through our analysis on the Class False Positive Score, we assess a fair evaluation of how susceptible each class is to misclassification.

Towards Class-wise Robustness Analysis

TL;DR

The paper tackles the problem of class-wise robustness under domain shifts and adversarial threats by introducing the Class False Positive Score () to quantify cross-class misclassification biases. It evaluates CFPS alongside robust accuracy across CIFAR-10 and CIFAR-10-C using multiple architectures and PGD attacks, revealing that per-class vulnerabilities do not always align with overall robustness. Key findings show high CFPS for class C4 (cat) and weak accuracy for classes like C3 (bird), C5 (deer), and C6 (dog), with these patterns persisting under corruptions and targeted attacks. The work highlights the practical impact of class-wise biases for designing defenses that address latent-space structure and cross-class confusion to improve real-world reliability.

Abstract

While being very successful in solving many downstream tasks, the application of deep neural networks is limited in real-life scenarios because of their susceptibility to domain shifts such as common corruptions, and adversarial attacks. The existence of adversarial examples and data corruption significantly reduces the performance of deep classification models. Researchers have made strides in developing robust neural architectures to bolster decisions of deep classifiers. However, most of these works rely on effective adversarial training methods, and predominantly focus on overall model robustness, disregarding class-wise differences in robustness, which are critical. Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models. Therefore, this study investigates class-to-class biases across adversarially trained robust classification models to understand their latent space structures and analyze their strong and weak class-wise properties. We further assess the robustness of classes against common corruptions and adversarial attacks, recognizing that class vulnerability extends beyond the number of correct classifications for a specific class. We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks. Through our analysis on the Class False Positive Score, we assess a fair evaluation of how susceptible each class is to misclassification.

Paper Structure

This paper contains 11 sections, 2 equations, 8 figures.

Figures (8)

  • Figure 1: Class-wise accuracies of CIFAR10 across different robust model architectures. The horizontal lines in the figure depict the average overall accuracy of respective adversarially trained robust models.
  • Figure 2: Class-wise CFPS of CIFAR10 across different robust model architectures.
  • Figure 3: Class-wise robust accuracies(top) and robust CFPSs (bottom) across different model architectures under corruptions. Robust accuracies are presented in fractions. Some classes with reasonably high robust accuracies tend to easily attract false positives and are thus overall more vulnerable than expected.
  • Figure 4: Confusion Matrix defining ground truth (vertical axis) versus predictions(horizontal axis) under PGD attack.
  • Figure 5: Evaluation of PGD target attack for all classes of CIFAR10 dataset using success rate.
  • ...and 3 more figures