Table of Contents
Fetching ...

Robustness to Adversarial Perturbations in Learning from Incomplete Data

Amir Najafi, Shin-ichi Maeda, Masanori Koyama, Takeru Miyato

TL;DR

This work addresses robustness of learning under adversarial distributional shifts when only partial labels are available. It unifies Semi-Supervised Learning and Distributionally Robust Learning into the SSDRL framework, introduces a dual formulation and soft-label self-learning to leverage unlabeled data, and provides generalization guarantees via novel adversarial complexity metrics (SSM Rademacher) and the Minimum Supervision Ratio. The authors prove convergence of a SGD-based optimizer for the semi-supervised objective and demonstrate that SSDRL is competitive with state-of-the-art methods like VAT and Pseudo-Labeling on standard benchmarks. Overall, the paper advances theory and practice for robust learning from incomplete data, with practical algorithms and empirical validation on multiple image datasets.

Abstract

What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue. Moreover, our analysis is able to quantify the role of unlabeled data in the generalization under a more general condition compared to the existing theoretical works in SSL. Based on our framework, we also present a hybrid of DRL and EM algorithms that has a guaranteed convergence rate. When implemented with deep neural networks, our method shows a comparable performance to those of the state-of-the-art on a number of real-world benchmark datasets.

Robustness to Adversarial Perturbations in Learning from Incomplete Data

TL;DR

This work addresses robustness of learning under adversarial distributional shifts when only partial labels are available. It unifies Semi-Supervised Learning and Distributionally Robust Learning into the SSDRL framework, introduces a dual formulation and soft-label self-learning to leverage unlabeled data, and provides generalization guarantees via novel adversarial complexity metrics (SSM Rademacher) and the Minimum Supervision Ratio. The authors prove convergence of a SGD-based optimizer for the semi-supervised objective and demonstrate that SSDRL is competitive with state-of-the-art methods like VAT and Pseudo-Labeling on standard benchmarks. Overall, the paper advances theory and practice for robust learning from incomplete data, with practical algorithms and empirical validation on multiple image datasets.

Abstract

What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue. Moreover, our analysis is able to quantify the role of unlabeled data in the generalization under a more general condition compared to the existing theoretical works in SSL. Based on our framework, we also present a hybrid of DRL and EM algorithms that has a guaranteed convergence rate. When implemented with deep neural networks, our method shows a comparable performance to those of the state-of-the-art on a number of real-world benchmark datasets.

Paper Structure

This paper contains 21 sections, 17 theorems, 119 equations, 11 figures, 1 algorithm.

Key Result

Theorem 1

Assume a continuous loss $\ell:\mathcal{Z}\times\Theta\rightarrow\mathbb{R}$ and continuous $c:\mathcal{Z}\times\mathcal{Z}\rightarrow\mathbb{R}_{\ge0}$, parameters $\epsilon\ge0$ and $\lambda\in\mathbb{R}\cup\left\{\pm\infty\right\}$, and a partially-labeled dataset $\boldsymbol{D}$ with size $n$. where $\phi_{\gamma}\left(\boldsymbol{X},y;\theta\right)$, called adversarial loss, is defined as

Figures (11)

  • Figure 1: Comparison of the test error-rates on adversarial examples attained via sinha2018certifying among different methods.
  • Figure 2: Comparison of the test error-rates on adversarial examples calculated by PGM madry2017towards, under $\ell_2$-norm constraint.
  • Figure 3: Test error-rates on clean examples. For DRL, VAT and F-SSDRL, rows 1 to 3 correspond to the parameter ($\gamma_i$ for DRL and F-SSDRL, and $\varepsilon_i$ for VAT) that yields the lowest error rates on: ($i=1$) clean examples, ($i=2$) adversarial examples by sinha2018certifying, and ($i=3$) adversarial examples by PGM, respectively.
  • Figure 4: Error rates on adversarial examples generated via the algorithm in sinha2018certifying vs. $\gamma^{-1}_{\mathrm{eval}}$ on the MNIST dataset.
  • Figure 5: Comparison of the average adversarial loss among different methods.
  • ...and 6 more figures

Theorems & Definitions (38)

  • Definition 1: Wasserstein distance
  • Definition 2
  • Definition 3
  • Theorem 1: Lagrangian-Relaxation
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • Definition 4: SSM Rademacher Complexity
  • Theorem 3: Generalization
  • Definition B.1
  • ...and 28 more