Certified Causal Defense with Generalizable Robustness
Yiran Qiao, Yu Yin, Chen Chen, Jing Ma
TL;DR
The paper tackles the challenge of generalizing certified robustness under domain shifts by leveraging causal factors that deterministically govern labels. It introduces GLEAN, a causality-inspired framework that (i) learns latent causal factors with invariant risk minimization, (ii) enforces 1-Lipschitz constraints on the causal encoder to enable certifiable robustness in latent space, and (iii) provides theoretical guarantees (Theorems 1 and 2) for cross-domain robustness by performing random smoothing in the latent space and mapping the radius back to the input space. Through experiments on CMNIST, CelebA, and DomainNet, GLEAN outperforms strong RS-based baselines in both certified accuracy and average certified radius (ACR) across radii and domains, demonstrating improved robustness under distribution shifts. The work contributes a principled approach to robustness generalization by coupling causal factor discovery with certified defense, offering practical impact for deploying reliable models in varied data environments.
Abstract
While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.
