Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

Xuexin Chen; Ruichu Cai; Zhengting Huang; Yuxuan Zhu; Julien Horwood; Zhifeng Hao; Zijian Li; Jose Miguel Hernandez-Lobato

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

Xuexin Chen, Ruichu Cai, Zhengting Huang, Yuxuan Zhu, Julien Horwood, Zhifeng Hao, Zijian Li, Jose Miguel Hernandez-Lobato

TL;DR

This paper addresses the limitations of perturbation-based feature attribution methods by introducing Feature Attribution with Necessity and Sufficiency (FANS), which leverages the probabilistic notion of causation, specifically the Probability of Necessity and Sufficiency ($PNS$), to quantify feature importance in a local neighborhood around the target input.FANS formulates a Structural Causal Model (SCM) for perturbation-based attribution, defines neighborhoods $ ilde{X}$ around the target, and uses a dual-stage Abduction-Action-Prediction framework to estimate counterfactual probabilities for both Necessity and Sufficiency.To estimate the complex conditional distributions required by these counterfactuals, FANS employs Sampling-Importance-Resampling (SIR) and optimizes over feature subsets using gradient-based methods with continuous relaxation, producing a Necessity and Sufficiency Attribution (NSA) score and selecting the subset with the highest NSA.Empirical results on six benchmarks (image and graph data) show that FANS achieves superior faithfulness, sparsity, and robustness compared with a broad set of baselines, and the authors provide extensive ablations and convergence analyses; code is available at the cited repository.

Abstract

We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs' discriminative power, we introduce Feature Attribution with Necessity and Sufficiency (FANS), which find a neighborhood of the input such that perturbing samples within this neighborhood have a high Probability of being Necessity and Sufficiency (PNS) cause for the change in predictions, and use this PNS as the importance of the feature. Specifically, FANS compute this PNS via a heuristic strategy for estimating the neighborhood and a perturbation test involving two stages (factual and interventional) for counterfactual reasoning. To generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution. We demonstrate that FANS outperforms existing attribution methods on six benchmarks. Please refer to the source code via \url{https://github.com/DMIRLAB-Group/FANS}.

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

TL;DR

Abstract

Paper Structure (37 sections, 22 equations, 8 figures, 4 tables)

This paper contains 37 sections, 22 equations, 8 figures, 4 tables.

Introduction
Causal Model for Feature Attribution
Feature Attribution as a Problem of PNS Measurement
Necessary and Sufficient Attribution via Dual-stage Perturbation Test
Overview
Dual-stage Perturbation Test
Sufficiency Module
Factual Stage
Intervention Stage
Necessity Module
PNS Estimation
Necessary and Sufficient Attribution Estimation
Extracting Feature Subset with the Highest Necessary and Sufficient Attribution
Experiments
Experimental Setup
...and 22 more sections

Figures (8)

Figure 1: Causal diagram of standard perturbation test in feature attribution. $\mathbf{S}$ denotes a subset of dimensions of $\mathbf{X}$ for perturbation. $\tilde{\mathbf{X}}$ represents an input with fixed features on $\mathbf{S}$ that are similar to the target input $\mathbf{x}^t$.
Figure 2: Architecture of FANS, which takes the sample $\mathbf{x}^t$ to be explained and the samples $\mathcal{E}\overset{\text{iid}}{\sim} P(\mathbf{X})$ as inputs, throughout the necessity and sufficiency modules to output PN and PS, and finally combine PN, PS into PNS. Each module consists of two stages. 1) Factual stage. Generate samples $\mathcal{E}_{\text{NC}}$ and $\mathcal{E}_{\text{SF}}$ conditional on the fact that the model's predictions change or remain unchanged respectively after performing perturbations on dimension subset $\mathbf{s}$ and $\bar{\mathbf{s}}$. 2) Intervention stage. Apply perturbations different from the facts to $\mathcal{E}_{\text{NC}}$ and $\mathcal{E}_{\text{SF}}$, and count the proportion of changes and remaining unchanged by comparing the perturbed prediction $y'$ and the original prediction $y$.
Figure 3: Performance on robustness comparison in image datasets CIFAR10 and MNIST under different strengths of noise $r$.
Figure 4: Attributions visualization on the MNIST dataset.
Figure 5: Ablation study of sufficiency (SF) module, necessity (NC) module, and SIR-based Sampling (SR) on graph datasets Citeseer and Pubmed.
...and 3 more figures

Theorems & Definitions (4)

Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

TL;DR

Abstract

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (4)