Table of Contents
Fetching ...

ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks

Yu Zhang, Sean Bin Yang, Arijit Khan, Cuneyt Gurcan Akcora

TL;DR

This work tackles the explainability of Graph Neural Networks by bridging counterfactual explanations with adversarial attacks. It introduces ATEX-CF, a hybrid framework that simultaneously considers edge deletions and attack-informed edge additions to generate concise, plausible counterfactuals within a perturbation budget $\kappa$. Through a joint optimization of impact, sparsity, and plausibility, ATEX-CF achieves higher fidelity and more realistic explanations than deletion-only or attack baselines across diverse datasets and architectures. The approach is reinforced by a theoretical link between attack perturbations and counterfactual reasoning (Hypothesis H1) and validated with extensive experiments, ablations, and analyses of asymmetric perturbation costs. The results highlight the potential of leveraging adversarial insights to enhance interpretability and robustness in graph-based decision systems, with broad applicability to healthcare, finance, and science domains.

Abstract

Counterfactual explanations offer an intuitive way to interpret graph neural networks (GNNs) by identifying minimal changes that alter a model's prediction, thereby answering "what must differ for a different outcome?". In this work, we propose a novel framework, ATEX-CF that unifies adversarial attack techniques with counterfactual explanation generation-a connection made feasible by their shared goal of flipping a node's prediction, yet differing in perturbation strategy: adversarial attacks often rely on edge additions, while counterfactual methods typically use deletions. Unlike traditional approaches that treat explanation and attack separately, our method efficiently integrates both edge additions and deletions, grounded in theory, leveraging adversarial insights to explore impactful counterfactuals. In addition, by jointly optimizing fidelity, sparsity, and plausibility under a constrained perturbation budget, our method produces instance-level explanations that are both informative and realistic. Experiments on synthetic and real-world node classification benchmarks demonstrate that ATEX-CF generates faithful, concise, and plausible explanations, highlighting the effectiveness of integrating adversarial insights into counterfactual reasoning for GNNs.

ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks

TL;DR

This work tackles the explainability of Graph Neural Networks by bridging counterfactual explanations with adversarial attacks. It introduces ATEX-CF, a hybrid framework that simultaneously considers edge deletions and attack-informed edge additions to generate concise, plausible counterfactuals within a perturbation budget . Through a joint optimization of impact, sparsity, and plausibility, ATEX-CF achieves higher fidelity and more realistic explanations than deletion-only or attack baselines across diverse datasets and architectures. The approach is reinforced by a theoretical link between attack perturbations and counterfactual reasoning (Hypothesis H1) and validated with extensive experiments, ablations, and analyses of asymmetric perturbation costs. The results highlight the potential of leveraging adversarial insights to enhance interpretability and robustness in graph-based decision systems, with broad applicability to healthcare, finance, and science domains.

Abstract

Counterfactual explanations offer an intuitive way to interpret graph neural networks (GNNs) by identifying minimal changes that alter a model's prediction, thereby answering "what must differ for a different outcome?". In this work, we propose a novel framework, ATEX-CF that unifies adversarial attack techniques with counterfactual explanation generation-a connection made feasible by their shared goal of flipping a node's prediction, yet differing in perturbation strategy: adversarial attacks often rely on edge additions, while counterfactual methods typically use deletions. Unlike traditional approaches that treat explanation and attack separately, our method efficiently integrates both edge additions and deletions, grounded in theory, leveraging adversarial insights to explore impactful counterfactuals. In addition, by jointly optimizing fidelity, sparsity, and plausibility under a constrained perturbation budget, our method produces instance-level explanations that are both informative and realistic. Experiments on synthetic and real-world node classification benchmarks demonstrate that ATEX-CF generates faithful, concise, and plausible explanations, highlighting the effectiveness of integrating adversarial insights into counterfactual reasoning for GNNs.
Paper Structure (38 sections, 4 theorems, 19 equations, 10 figures, 22 tables, 2 algorithms)

This paper contains 38 sections, 4 theorems, 19 equations, 10 figures, 22 tables, 2 algorithms.

Key Result

Proposition A.1

Let $G'=G\setminus S$ for some strict subset $S\subsetneq{(v,u):u\in\mathcal{N}(v)}$. If where $bias_v$ is a bias term for node $v$’s own features, $r_u$ is the contribution from neighbor $u$’s features, aligned with class $y$, and $w_{vu} \ge 0$ is the scalar weight that measures how strongly neighbor $u$ influences $v$’s score. Then $f_{G'}(v)=y$. In words, as long as at least one inc

Figures (10)

  • Figure 1: Illustration of counterfactual limitations in the Loan Decision dataset.
  • Figure 1: ATEX-CF: Counterfactual Generator
  • Figure 2: End-to-end workflow of the ATEX-CF framework for counterfactual edge generation.
  • Figure 3: Counterfactual explanations on Cora and GCN under varying perturbation budgets $\kappa$
  • Figure 4: Performance of counterfactual explanations vs. the number of GNN layers: The results demonstrate sensitivity w.r.t. the number of hops for the local structure surrounding the target node.
  • ...and 5 more figures

Theorems & Definitions (5)

  • proof
  • Proposition A.1: Deletion Infeasibility
  • Proposition A.2: Addition Sufficiency
  • Corollary A.3: Budgeted reachability and strict advantage of additions
  • Corollary A.4: Edit cost and latent stability