Table of Contents
Fetching ...

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng

TL;DR

Addressing vulnerability to unseen adversarial attacks, the paper introduces CausalDiff, a diffusion-based defense that disentangles label-causative factors $S$ from label-non-causative factors $Z$ using a Causal Information Bottleneck (CIB). The approach couples a conditional diffusion generator with a causal learning objective to reconstruct data from $(S,Z)$, purify adversarial inputs to $X^*$, infer $(S,Z)$, and classify via $S$-based information. Empirical results on CIFAR-10/100 and GTSRB show state-of-the-art robustness to unseen attacks, with average robustness gains (e.g., $86.39\%$ on CIFAR-10, $56.25\%$ on CIFAR-100, $82.62\%$ on GTSRB) and clear interpretability of the latent factors. The work advances adversarial defense by integrating causal representation learning with diffusion models, offering a principled, scalable path toward robust and human-aligned vision systems.

Abstract

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark). The code is available at https://github.com/CAS-AISafetyBasicResearchGroup/CausalDiff.

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

TL;DR

Addressing vulnerability to unseen adversarial attacks, the paper introduces CausalDiff, a diffusion-based defense that disentangles label-causative factors from label-non-causative factors using a Causal Information Bottleneck (CIB). The approach couples a conditional diffusion generator with a causal learning objective to reconstruct data from , purify adversarial inputs to , infer , and classify via -based information. Empirical results on CIFAR-10/100 and GTSRB show state-of-the-art robustness to unseen attacks, with average robustness gains (e.g., on CIFAR-10, on CIFAR-100, on GTSRB) and clear interpretability of the latent factors. The work advances adversarial defense by integrating causal representation learning with diffusion models, offering a principled, scalable path toward robust and human-aligned vision systems.

Abstract

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark). The code is available at https://github.com/CAS-AISafetyBasicResearchGroup/CausalDiff.

Paper Structure

This paper contains 32 sections, 18 equations, 9 figures, 5 tables, 3 algorithms.

Figures (9)

  • Figure 1: Illustration of training (Left) and inference (Right) processes of our proposed CausalDiff model. During training, the model constructs a structural causal model leveraging a conditional diffusion model, disentangling the (label) Y-causative feature $S$ and the Y-non-causative feature $Z$ through maximization of the Causal Information Bottleneck (CIB). In the inference stage, CausalDiff first purifies an adversarial example $\tilde{X}$, yielded by perturbing $X$ according to the target victim model parameterized by $\theta$, to obtain the benign counterpart $X^*$. Then, it infers the Y-causative feature $S^*$ for label prediction. We visualize the vectors of $S$ and $Z$ inferred from a perturbed image of a horse using a diffusion model. We find that $S$ captures the general concept of a horse, even when the input image only shows the head, while $Z$ carries information about the horse's skin color.
  • Figure 2: Adversarial robustness of four models against 100-step PGD attack under varying attack strength indicated by $\epsilon$-budget.
  • Figure 3: Visualizations of feature space for the two categories on toy data by T-SNE for (a) discriminative model, (b) generative model, (c) causal model without disentanglement, and (d) causal model with disentanglement.
  • Figure 4: Visualization by T-SNE of the feature space, inferred by our CausalDiff, of the label-causative factor $s$, label-non-causative factor $z$, and their concatenation.
  • Figure 5: SCM of models for pilot study including (a) discriminative model, (b) generative model, (c) causal model without disentanglement, and (d) causal model with disentanglement.
  • ...and 4 more figures