Table of Contents
Fetching ...

Sy-FAR: Symmetry-based Fair Adversarial Robustness

Haneen Najjar, Eyal Ronen, Mahmood Sharif

TL;DR

Sy-FAR tackles fairness in adversarial robustness by enforcing symmetry in misclassification patterns between class pairs. It introduces a differentiable symmetry regularizer based on a soft confusion matrix $C$, with a pairwise asymmetry penalty that promotes $C_{ij} ightarrow C_{ji}$, and integrates it into adversarial training via $\mathcal{L} = \lambda_{clean}\mathcal{L}_{CE}(x,y) + \lambda_{adv}\mathcal{L}_{CE}(x^{adv},y) + \lambda_{sym}\mathcal{L}_{sym}(C)$. The key theoretical result shows that class-level symmetry implies subgroup symmetry, enabling fair robustness across arbitrary groupings without explicit group information. Empirically, Sy-FAR achieves stronger source-class and target-class fairness, improves robustness, and runs faster with lower variance than FAAL and SpecNorm across five datasets and three architectures, including realistic eyeglass and face-mask attacks; it also demonstrates substantial improvements in subgroup fairness. These findings support symmetry as a principled, scalable approach to fair adversarial training with practical significance for safety-critical vision systems.

Abstract

Security-critical machine-learning (ML) systems, such as face-recognition systems, are susceptible to adversarial examples, including real-world physically realizable attacks. Various means to boost ML's adversarial robustness have been proposed; however, they typically induce unfair robustness: It is often easier to attack from certain classes or groups than from others. Several techniques have been developed to improve adversarial robustness while seeking perfect fairness between classes. Yet, prior work has focused on settings where security and fairness are less critical. Our insight is that achieving perfect parity in realistic fairness-critical tasks, such as face recognition, is often infeasible -- some classes may be highly similar, leading to more misclassifications between them. Instead, we suggest that seeking symmetry -- i.e., attacks from class $i$ to $j$ would be as successful as from $j$ to $i$ -- is more tractable. Intuitively, symmetry is a desirable because class resemblance is a symmetric relation in most domains. Additionally, as we prove theoretically, symmetry between individuals induces symmetry between any set of sub-groups, in contrast to other fairness notions where group-fairness is often elusive. We develop Sy-FAR, a technique to encourage symmetry while also optimizing adversarial robustness and extensively evaluate it using five datasets, with three model architectures, including against targeted and untargeted realistic attacks. The results show Sy-FAR significantly improves fair adversarial robustness compared to state-of-the-art methods. Moreover, we find that Sy-FAR is faster and more consistent across runs. Notably, Sy-FAR also ameliorates another type of unfairness we discover in this work -- target classes that adversarial examples are likely to be classified into become significantly less vulnerable after inducing symmetry.

Sy-FAR: Symmetry-based Fair Adversarial Robustness

TL;DR

Sy-FAR tackles fairness in adversarial robustness by enforcing symmetry in misclassification patterns between class pairs. It introduces a differentiable symmetry regularizer based on a soft confusion matrix , with a pairwise asymmetry penalty that promotes , and integrates it into adversarial training via . The key theoretical result shows that class-level symmetry implies subgroup symmetry, enabling fair robustness across arbitrary groupings without explicit group information. Empirically, Sy-FAR achieves stronger source-class and target-class fairness, improves robustness, and runs faster with lower variance than FAAL and SpecNorm across five datasets and three architectures, including realistic eyeglass and face-mask attacks; it also demonstrates substantial improvements in subgroup fairness. These findings support symmetry as a principled, scalable approach to fair adversarial training with practical significance for safety-critical vision systems.

Abstract

Security-critical machine-learning (ML) systems, such as face-recognition systems, are susceptible to adversarial examples, including real-world physically realizable attacks. Various means to boost ML's adversarial robustness have been proposed; however, they typically induce unfair robustness: It is often easier to attack from certain classes or groups than from others. Several techniques have been developed to improve adversarial robustness while seeking perfect fairness between classes. Yet, prior work has focused on settings where security and fairness are less critical. Our insight is that achieving perfect parity in realistic fairness-critical tasks, such as face recognition, is often infeasible -- some classes may be highly similar, leading to more misclassifications between them. Instead, we suggest that seeking symmetry -- i.e., attacks from class to would be as successful as from to -- is more tractable. Intuitively, symmetry is a desirable because class resemblance is a symmetric relation in most domains. Additionally, as we prove theoretically, symmetry between individuals induces symmetry between any set of sub-groups, in contrast to other fairness notions where group-fairness is often elusive. We develop Sy-FAR, a technique to encourage symmetry while also optimizing adversarial robustness and extensively evaluate it using five datasets, with three model architectures, including against targeted and untargeted realistic attacks. The results show Sy-FAR significantly improves fair adversarial robustness compared to state-of-the-art methods. Moreover, we find that Sy-FAR is faster and more consistent across runs. Notably, Sy-FAR also ameliorates another type of unfairness we discover in this work -- target classes that adversarial examples are likely to be classified into become significantly less vulnerable after inducing symmetry.

Paper Structure

This paper contains 55 sections, 1 theorem, 11 equations, 12 figures, 16 tables, 1 algorithm.

Key Result

Theorem 1

Let ${C} \in \mathbb{R}_{\ge 0}^{K \times K}$ be a normalized confusion matrix. Let ${P} = \{G_1, G_2, \dots, G_m\}$ be a partition of $\{1,\dots,K\}$ into disjoint subgroups, then: Said concisely, ${C}$ is symmetric if and only if $\widehat{{C}}$ is symmetric for all subgroup partitions.

Figures (12)

  • Figure 1: Per-class robust accuracy of face-recognition models trained using different defensive methods on a subset of PubFig dataset attribute_classifiers using the VGG-16 architecture simonyan2015very. The models were trained to recognize a set of ten celebrities with an equal number of males and females (§\ref{['sec:experiment_setup']}). We report results obtained with four methods: adversarial training tong2021facesec, two leading approaches for enhancing fair source-class adversarial robustness (FAAL zhang2024towards and SpecNorm jin2025enhancing), and our proposed method, Sy-FAR. Adversarial examples were created with (untargeted) eyeglass attacks sharif2016accessorize.
  • Figure 2: An illustration of asymmetric vs. symmetric confusion matrices.
  • Figure 3: Illustration of the eyeglass attack on PubFig images attribute_classifiers. Original face images are at the top, and images perturbed with adversarial eyeglasses are at the bottom sharif2016accessorize. The attack only modifies pixels within the eyeglass region, producing perturbations that mislead face recognition.
  • Figure 4: Asymmetry heatmaps on PubFig$_{\text{SIB}}$, using the untargeted eyeglass attack to create adversarial examples. Each cell $(i,j)$ in the upper triangle reports the Asymmetry Gap, i.e., $|C_{ij}-C_{ji}|$ (see §\ref{['sec:metrics']}). Darker regions indicate stronger directional bias, where adversarial examples from one class are more likely to be classified into the other class than vice versa. We show representative heatmaps from individual randomly selected runs out of the ten repetitions.
  • Figure 5: Confusion matrices for different methods on adversarial examples produced with untargeted eyeglass attack against the PubFig setup. Diagonals indicate source-class robust accuracy; off-diagonals are misclassifications. We show representative heatmaps from individual randomly selected runs out of the ten repetitions.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Definition 1: Subgroup Misclassification Rate
  • Definition 2: Subgroup Symmetry
  • Theorem 1
  • proof