Table of Contents
Fetching ...

Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations

Sven Gowal, Chongli Qin, Po-Sen Huang, Taylan Cemgil, Krishnamurthy Dvijotham, Timothy Mann, Pushmeet Kohli

TL;DR

This work addresses the brittleness of supervised learning under real-world distribution shifts by introducing AdvMix, a framework that uses disentangled latent factors from a generative model to craft semantically meaningful perturbations through latent-space mixing. By formalizing semantic adversarial risk and leveraging StyleGAN’s latent-disentanglement, AdvMix generates worst-case variations that preserve semantics while challenging the classifier, and trains to be invariant to these perturbations. Empirical results on Color-Mnist and CelebA show AdvMix improves generalization and reduces bias more reliably than standard data augmentation or ℓ_p-based adversarial training, with notable gains in robustness under semantic perturbations. The approach demonstrates that disentangled representations can enable robust perception in the wild, motivating further development of semantically aware perturbations and their integration into safety-critical vision systems.

Abstract

Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to analytically defined transformations like $\ell_p$-norm bounded perturbations. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in lighting conditions). In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. The two key ideas underlying our formulation are (1) leveraging disentangled representations of the input to define different factors of variations, and (2) generating new input images by adversarially composing the representations of different images. We use a StyleGAN model to demonstrate the efficacy of this framework. Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations. Extensive experiments show that our method improves generalization and reduces the effect of spurious correlations (reducing the error rate of a "smile" detector by 21% for example).

Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations

TL;DR

This work addresses the brittleness of supervised learning under real-world distribution shifts by introducing AdvMix, a framework that uses disentangled latent factors from a generative model to craft semantically meaningful perturbations through latent-space mixing. By formalizing semantic adversarial risk and leveraging StyleGAN’s latent-disentanglement, AdvMix generates worst-case variations that preserve semantics while challenging the classifier, and trains to be invariant to these perturbations. Empirical results on Color-Mnist and CelebA show AdvMix improves generalization and reduces bias more reliably than standard data augmentation or ℓ_p-based adversarial training, with notable gains in robustness under semantic perturbations. The approach demonstrates that disentangled representations can enable robust perception in the wild, motivating further development of semantically aware perturbations and their integration into safety-critical vision systems.

Abstract

Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to analytically defined transformations like -norm bounded perturbations. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in lighting conditions). In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. The two key ideas underlying our formulation are (1) leveraging disentangled representations of the input to define different factors of variations, and (2) generating new input images by adversarially composing the representations of different images. We use a StyleGAN model to demonstrate the efficacy of this framework. Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations. Extensive experiments show that our method improves generalization and reduces the effect of spurious correlations (reducing the error rate of a "smile" detector by 21% for example).

Paper Structure

This paper contains 25 sections, 17 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Variations of the same faces. A model obtained through classical training classifies the same face as both "smiling" and "not smiling" (depending on the variations). Our model remains consistent in terms of classification. Note that these persons "do not exist" and have been generated using a StyleGAN model.
  • Figure 2: Comparison of different data augmentation techniques. These transformations tend to destroy the image semantics.
  • Figure 3: Illustration of the maximization process in Equation \ref{['eq:max_rec']}.
  • Figure 4: Comparison of mixup and AdvMix on a toy example. In this example, we are given 200 datapoints. Each data point $(x_1, x_2)$ is sampled according to $x_1 \sim \mathcal{N}(z_{\perp}, \sqrt{3})$ where $z_{\perp} \in \mathcal{Z}_{\perp} = \lbrace 0., 10.\rbrace$ and $x_2 \sim \mathcal{N}(z_{\parallel}, 1)$ where $z_{\parallel} \in \mathcal{Z}_{\parallel} = \lbrace 0., 20.\rbrace$. The colors represent the label. Note that the latent variable $z_{\parallel} = 20y$ is dependent on the label while $z_{\perp}$ is independent of the label. Panel (a) shows the original set of 200 datapoints; panel (b) shows the effect of sampling additional data using AdvMix; and panel (c) shows the effect of mixup. Of course, we should point out that our method, AdvMix, is aware of the underlying latent representation, while mixup is not.
  • Figure 5: Panel \ref{['fig:encoder_example_process']} shows how the latents are progressively able to match a target image (on the far right). Panel \ref{['fig:encoder_example_mixed']} shows two different variations of the obtained image.
  • ...and 6 more figures