Table of Contents
Fetching ...

Certified Causal Defense with Generalizable Robustness

Yiran Qiao, Yu Yin, Chen Chen, Jing Ma

TL;DR

The paper tackles the challenge of generalizing certified robustness under domain shifts by leveraging causal factors that deterministically govern labels. It introduces GLEAN, a causality-inspired framework that (i) learns latent causal factors with invariant risk minimization, (ii) enforces 1-Lipschitz constraints on the causal encoder to enable certifiable robustness in latent space, and (iii) provides theoretical guarantees (Theorems 1 and 2) for cross-domain robustness by performing random smoothing in the latent space and mapping the radius back to the input space. Through experiments on CMNIST, CelebA, and DomainNet, GLEAN outperforms strong RS-based baselines in both certified accuracy and average certified radius (ACR) across radii and domains, demonstrating improved robustness under distribution shifts. The work contributes a principled approach to robustness generalization by coupling causal factor discovery with certified defense, offering practical impact for deploying reliable models in varied data environments.

Abstract

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.

Certified Causal Defense with Generalizable Robustness

TL;DR

The paper tackles the challenge of generalizing certified robustness under domain shifts by leveraging causal factors that deterministically govern labels. It introduces GLEAN, a causality-inspired framework that (i) learns latent causal factors with invariant risk minimization, (ii) enforces 1-Lipschitz constraints on the causal encoder to enable certifiable robustness in latent space, and (iii) provides theoretical guarantees (Theorems 1 and 2) for cross-domain robustness by performing random smoothing in the latent space and mapping the radius back to the input space. Through experiments on CMNIST, CelebA, and DomainNet, GLEAN outperforms strong RS-based baselines in both certified accuracy and average certified radius (ACR) across radii and domains, demonstrating improved robustness under distribution shifts. The work contributes a principled approach to robustness generalization by coupling causal factor discovery with certified defense, offering practical impact for deploying reliable models in varied data environments.

Abstract

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.
Paper Structure (23 sections, 10 equations, 5 figures, 2 tables)

This paper contains 23 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: (a) Causal graph of data generation across domains; (b) A showcase of domain shift, here we use images in CMNIST as an example. (c) Two common cases of domain shifts leading to decreased ACR in certification. The pink area represents an incorrect decision area, green signifies the correct decision area, and circles represent a robust $l_2$ ball.
  • Figure 2: An overview of the proposed framework GLEAN. The upper part represents the training process, while the lower part depicts the certification process on the test domain. Here, we showcase two training domains and one test domain with two classes 0 and 1, where the color of the object is a spurious factor. We define $S=0$ as orange and $S=1$ as green. In training domain 1, the spurious distribution between color and category is $P(Y=0|S=0)=0.9$ and $P(Y=1|S=1)=0.1$. These values change to 0.8 and 0.2, respectively in training domain 2, and then to 0.1 and 0.9 in the test domain. Thus, there is a correlation shift between the different domains of this dataset. The causal encoder is equipped with Lipschitz constraints with Lipschitz constant $L$. $Y_{train}$ and $Y_{test}$ are ground truth labels. $\hat{Y}$ is the predicted label. $z$ is the causal latent representations and each $\eta$ is a Gaussian noise. $y_A$ is the most probable class among all the $\hat{Y}$ after sampling with the probability $p_A$. Then we can leverage $p_A$ to compute the certified radius in latent space $CR_z$ and finally revert it back to get the certified radius $CR$ in input space.
  • Figure 3: Comparison of certified accuracy obtained using different methods across three datasets. The sharp decline at the end of the curves is due to a hard upper bound in the certification process for a variance $\sigma$ and the number of Gaussian samples $n$.
  • Figure 4: Ablation study on CMNIST with different $\sigma$.
  • Figure 5: Performance of GLEAN under different parameters.