Table of Contents
Fetching ...

CausAdv: A Causal-based Framework for Detecting Adversarial Examples

Hichem Debbi

TL;DR

CausAdv introduces a causal, counterfactual framework for detecting adversarial examples in CNNs by analyzing Counterfactual Information (CI) of filters in the last convolutional layer. Prototypes are used to define causal versus non-causal filters and to compute CI via ablation, enabling four detection strategies that distinguish clean from adversarial inputs without modifying the input or additional detectors. Empirical results on ImageNet and CIFAR-10 show strong detection capabilities, particularly against BIM and FGSM-like attacks, and demonstrate superior performance over several existing defenses while providing interpretable causal explanations. The approach highlights the value of causal reasoning for robustness and offers a lightweight, architecture-agnostic, and explainable detection mechanism with practical applicability. CI-based visualizations further support the interpretability of decisions and may aid in qualitative adversarial analysis and localization tasks.

Abstract

Deep learning has led to tremendous success in computer vision, largely due to Convolutional Neural Networks (CNNs). However, CNNs have been shown to be vulnerable to crafted adversarial perturbations. This vulnerability of adversarial examples has has motivated research into improving model robustness through adversarial detection and defense methods. In this paper, we address the adversarial robustness of CNNs through causal reasoning. We propose CausAdv: a causal framework for detecting adversarial examples based on counterfactual reasoning. CausAdv learns both causal and non-causal features of every input, and quantifies the counterfactual information (CI) of every filter of the last convolutional layer. We then perform a statistical analysis of the filters' CI across clean and adversarial samples, to demonstrate that adversarial examples exhibit different CI distributions compared to clean samples. Our results show that causal reasoning enhances the process of adversarial detection without the need to train a separate detector. Moreover, we illustrate the efficiency of causal explanations as a helpful detection tool by visualizing the extracted causal features.

CausAdv: A Causal-based Framework for Detecting Adversarial Examples

TL;DR

CausAdv introduces a causal, counterfactual framework for detecting adversarial examples in CNNs by analyzing Counterfactual Information (CI) of filters in the last convolutional layer. Prototypes are used to define causal versus non-causal filters and to compute CI via ablation, enabling four detection strategies that distinguish clean from adversarial inputs without modifying the input or additional detectors. Empirical results on ImageNet and CIFAR-10 show strong detection capabilities, particularly against BIM and FGSM-like attacks, and demonstrate superior performance over several existing defenses while providing interpretable causal explanations. The approach highlights the value of causal reasoning for robustness and offers a lightweight, architecture-agnostic, and explainable detection mechanism with practical applicability. CI-based visualizations further support the interpretability of decisions and may aid in qualitative adversarial analysis and localization tasks.

Abstract

Deep learning has led to tremendous success in computer vision, largely due to Convolutional Neural Networks (CNNs). However, CNNs have been shown to be vulnerable to crafted adversarial perturbations. This vulnerability of adversarial examples has has motivated research into improving model robustness through adversarial detection and defense methods. In this paper, we address the adversarial robustness of CNNs through causal reasoning. We propose CausAdv: a causal framework for detecting adversarial examples based on counterfactual reasoning. CausAdv learns both causal and non-causal features of every input, and quantifies the counterfactual information (CI) of every filter of the last convolutional layer. We then perform a statistical analysis of the filters' CI across clean and adversarial samples, to demonstrate that adversarial examples exhibit different CI distributions compared to clean samples. Our results show that causal reasoning enhances the process of adversarial detection without the need to train a separate detector. Moreover, we illustrate the efficiency of causal explanations as a helpful detection tool by visualizing the extracted causal features.

Paper Structure

This paper contains 12 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Causal learning process: measures the impact of each filter of the last convolutional layer of the CNN architecture when removed on the prediction probability. The difference in prediction probabilities after filter removal is referred to as counterfactual information (CI). Filters are categorized as causal or non-causal based on their effect on prediction probabilities.
  • Figure 2: Histograms of CI distributions for some ImageNet samples with their prototypes (2 samples per class) and their adversarial examples. All the attacks are performed with $\epsilon=8$.
  • Figure 3: Causal features visualized as attention maps on clean and adversarial samples.