Table of Contents
Fetching ...

Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement

Wenjing Chang, Kay Liu, Philip S. Yu, Jianjun Yu

TL;DR

This work tackles fairness in unsupervised graph anomaly detection by introducing DEFEND, a framework that learns disentangled representations separating sensitive-relevant and sensitive-irrelevant information. It uses a reconstruction-based anomaly score derived from the sensitive-irrelevant latent space and imposes a correlation constraint to reduce dependence on sensitive attributes, while adversarial training enforces independence between the two latent subspaces. Empirical results on Reddit, Twitter, and Credit demonstrate DEFEND’s ability to improve fairness metrics ($\Delta_{DP}$, $\Delta_{EO}$) with competitive anomaly detection performance, supported by ablations and parameter analyses that illuminate the fairness-utility trade-offs. The approach advances fair GAD with a principled, reproducible pipeline that can be extended to multiple attributes and various sensitive settings, enabling more ethical deployment of graph analytics in high-stakes domains.

Abstract

Graph anomaly detection (GAD) is increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions skewed toward certain demographic groups defined on sensitive attributes (e.g., gender, religion, ethnicity, etc.). This greatly limits the applicability of these methods in real-world scenarios in light of societal and ethical restrictions. To address this critical gap, we make the first attempt to integrate fairness with utility in GAD decision-making. Specifically, we devise a novel DisEntangle-based FairnEss-aware aNomaly Detection framework on the attributed graph, named DEFEND. DEFEND first introduces disentanglement in GNNs to capture informative yet sensitive-irrelevant node representations, effectively reducing societal bias inherent in graph representation learning. Besides, to alleviate discriminatory bias in evaluating anomalous nodes, DEFEND adopts a reconstruction-based anomaly detection, which concentrates solely on node attributes without incorporating any graph structure. Additionally, given the inherent association between input and sensitive attributes, DEFEND constrains the correlation between the reconstruction error and the predicted sensitive attributes. Our empirical evaluations on real-world datasets reveal that DEFEND performs effectively in GAD and significantly enhances fairness compared to state-of-the-art baselines. To foster reproducibility, our code is available at https://github.com/AhaChang/DEFEND.

Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement

TL;DR

This work tackles fairness in unsupervised graph anomaly detection by introducing DEFEND, a framework that learns disentangled representations separating sensitive-relevant and sensitive-irrelevant information. It uses a reconstruction-based anomaly score derived from the sensitive-irrelevant latent space and imposes a correlation constraint to reduce dependence on sensitive attributes, while adversarial training enforces independence between the two latent subspaces. Empirical results on Reddit, Twitter, and Credit demonstrate DEFEND’s ability to improve fairness metrics (, ) with competitive anomaly detection performance, supported by ablations and parameter analyses that illuminate the fairness-utility trade-offs. The approach advances fair GAD with a principled, reproducible pipeline that can be extended to multiple attributes and various sensitive settings, enabling more ethical deployment of graph analytics in high-stakes domains.

Abstract

Graph anomaly detection (GAD) is increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions skewed toward certain demographic groups defined on sensitive attributes (e.g., gender, religion, ethnicity, etc.). This greatly limits the applicability of these methods in real-world scenarios in light of societal and ethical restrictions. To address this critical gap, we make the first attempt to integrate fairness with utility in GAD decision-making. Specifically, we devise a novel DisEntangle-based FairnEss-aware aNomaly Detection framework on the attributed graph, named DEFEND. DEFEND first introduces disentanglement in GNNs to capture informative yet sensitive-irrelevant node representations, effectively reducing societal bias inherent in graph representation learning. Besides, to alleviate discriminatory bias in evaluating anomalous nodes, DEFEND adopts a reconstruction-based anomaly detection, which concentrates solely on node attributes without incorporating any graph structure. Additionally, given the inherent association between input and sensitive attributes, DEFEND constrains the correlation between the reconstruction error and the predicted sensitive attributes. Our empirical evaluations on real-world datasets reveal that DEFEND performs effectively in GAD and significantly enhances fairness compared to state-of-the-art baselines. To foster reproducibility, our code is available at https://github.com/AhaChang/DEFEND.
Paper Structure (33 sections, 19 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 33 sections, 19 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of proposed DEFEND framework. (Left) Disentangled fair representation learning. The disentangled graph encoder $f_e$ can separate sensitive-irrelevant representations $\mathbf{Z}_x$ and sensitive-relevant representations $\mathbf{Z}_s$ in latent space. (Right) Reconstruct-based graph anomaly detection. The constrained reconstruction error between $\mathbf{X}$ and $\tilde{\mathbf{X}}$ are used to identify anomalies. means fixing model parameters.
  • Figure 2: Fairness-Utility trade-off curves of different methods on three datasets. The upper-left corner is optimal, which has high AUC-ROC and low $\Delta_{EO}$.
  • Figure 3: Impacts of varying predictiveness term weight $\alpha$ and disentanglement term weight $\gamma$ in \ref{['eq:total_loss']} on Reddit dataset in terms of AUC-ROC and $\Delta_{EO}$.
  • Figure 4: Impacts of varying correlation constraints weight $\beta$ in \ref{['eq:ad_loss']} on Reddit.