Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Ruichu Cai; Yuxuan Zhu; Jie Qiao; Zefeng Liang; Furui Liu; Zhifeng Hao

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Ruichu Cai, Yuxuan Zhu, Jie Qiao, Zefeng Liang, Furui Liu, Zhifeng Hao

TL;DR

CADE introduces a causality-inspired framework to generate counterfactual adversarial examples by exploiting the data-generating process via causal interventions. It answers where to attack (on observable children and/or co-parents, or latent variables) and how to attack (abduction-action-prediction with white-box, transfer, and latent-image variants). The approach demonstrates competitive white-box and transfer-based performance across Pendulum, CelebA, and SynMeasurement, and provides insights into the role of causal structure in adversarial vulnerability. By leveraging SCMs and deliberate latent-space perturbations, CADE aims to produce more realistic adversarial examples and offers a principled path for future defenses that incorporate causal reasoning.

Abstract

Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

TL;DR

Abstract

Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived

-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

Paper Structure (55 sections, 3 theorems, 27 equations, 11 figures, 2 tables, 2 algorithms)

This paper contains 55 sections, 3 theorems, 27 equations, 11 figures, 2 tables, 2 algorithms.

Introduction
Background
Methodology
Motivating Example
Children Intervention
Co-parents Intervention
Generating Adversarial Example: Where to Attack?
Observable Variable Intervention
Latent Variable Intervention
Generating Adversarial Example: How to Attack?
Abduction
Action (Intervention) and Prediction
White-Box Attack
Black-Box Attack
Latent Attack for Image
...and 40 more sections

Key Result

Proposition 1

Given the SCM $M$, the discriminative conditional distribution: where $\mathrm{Mb}_y^{M}$ denotes the Markov blanket of $\mathrm{y}$ under SCM $M$, including the parents, children, co-parents of $\mathrm{y}$.

Figures (11)

Figure 1: Discriminative DNN's vulnerability to the interventional data.
Figure 2: Framework of CADE.
Figure 3: The causal graphs for each dataset: (a) Pendulum, (b) CelebA(Attractive), and (c) SynMeasurement.
Figure 4: Visualization of adversarial examples against Res-50 on Pendulum obtained by different approaches. The black dash-line highlights the original projection trajectory, while the red dash-line highlights the intervened projection trajectory.
Figure 5: Visualization of adversarial examples against Res-50 on CelebA obtained by different approaches.
...and 6 more figures

Theorems & Definitions (7)

Proposition 1
Proposition 2
Proposition 3
proof
proof
proof
proof

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

TL;DR

Abstract

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (7)