Table of Contents
Fetching ...

Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data

Lilin Zhang, Chengpei Wu, Ning Yang

TL;DR

Weakly Supervised Contrastive Adversarial Training (WSCAT) ensures complete perturbation for improved learning of robust features by disrupting correlations between non-robust features and labels through complete AE generation over partially labeled data, grounded in information theory.

Abstract

Existing adversarial training (AT) methods often suffer from incomplete perturbation, meaning that not all non-robust features are perturbed when generating adversarial examples (AEs). This results in residual correlations between non-robust features and labels, leading to suboptimal learning of robust features. However, achieving complete perturbation, i.e., perturbing as many non-robust features as possible, is challenging due to the difficulty in distinguishing robust and non-robust features and the sparsity of labeled data. To address these challenges, we propose a novel approach called Weakly Supervised Contrastive Adversarial Training (WSCAT). WSCAT ensures complete perturbation for improved learning of robust features by disrupting correlations between non-robust features and labels through complete AE generation over partially labeled data, grounded in information theory. Extensive theoretical analysis and comprehensive experiments on widely adopted benchmarks validate the superiority of WSCAT. Our code is available at https://github.com/zhang-lilin/WSCAT.

Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data

TL;DR

Weakly Supervised Contrastive Adversarial Training (WSCAT) ensures complete perturbation for improved learning of robust features by disrupting correlations between non-robust features and labels through complete AE generation over partially labeled data, grounded in information theory.

Abstract

Existing adversarial training (AT) methods often suffer from incomplete perturbation, meaning that not all non-robust features are perturbed when generating adversarial examples (AEs). This results in residual correlations between non-robust features and labels, leading to suboptimal learning of robust features. However, achieving complete perturbation, i.e., perturbing as many non-robust features as possible, is challenging due to the difficulty in distinguishing robust and non-robust features and the sparsity of labeled data. To address these challenges, we propose a novel approach called Weakly Supervised Contrastive Adversarial Training (WSCAT). WSCAT ensures complete perturbation for improved learning of robust features by disrupting correlations between non-robust features and labels through complete AE generation over partially labeled data, grounded in information theory. Extensive theoretical analysis and comprehensive experiments on widely adopted benchmarks validate the superiority of WSCAT. Our code is available at https://github.com/zhang-lilin/WSCAT.

Paper Structure

This paper contains 35 sections, 4 theorems, 18 equations, 6 figures, 10 tables.

Key Result

Theorem 1

$\max_{x^\prime \in \mathcal{B}_\epsilon (x)} l_\mathrm{con} (z^\prime, z)$$\approx$$\max_{x^\prime \in \mathcal{B}_\epsilon (x)} \frac{1}{ \vert \mathcal{N}^+_x \vert }$$\sum_{x_p \in \mathcal{N}^+_x} \max \{ 0, \{ s(z^\prime, z_n) - s(z^\prime, z_p) \}_{x_n \in \mathcal{D}, x_n \ne x_p} \}$.

Figures (6)

  • Figure 1: Illustration of non-robust features and their perturbations. For a digit image, color and size are the non-robust features correlating to the labels, while shape is the robust feature causing the labels.
  • Figure 2: Illustration of Information-theoretic Complete AE Generation, where blue, grey, and red areas represent the information contained in the robust features $R$, the features $Z$ of natural samples, and the features $Z^\prime$ of AEs, respectively, and the overlapped area represents mutual information $\mathrm{MI} (Z, Z^\prime)$ contained in the unperturbed features.
  • Figure 3: Distribution of similarity between the embeddings of natural data and AEs generated by different AT methods. All the AT methods share the same natural dataset.
  • Figure 4: Visualization of the embeddings of training samples produced by WSCAT and its variants on CIFAR10. The ground truth labels of the samples are color-coded.
  • Figure 5: Effect of $\beta$. The natural accuracy, robust accuracy against PGD attack are reported with the harmonic mean of them.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: Robust Feature
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof : Proof of \ref{['Le-metric']}
  • Lemma 2
  • proof : Proof of \ref{['Le-metric2']}
  • proof : Proof of \ref{['Th-CB']}
  • proof : Proof of \ref{['Th-RF']}