Table of Contents
Fetching ...

Unbiased GNN Learning via Fairness-Aware Subgraph Diffusion

Abdullah Alchihabi, Yuhong Guo

TL;DR

This work tackles bias in GNN-based node classification arising from graph-based message passing. It introduces Fairness-Aware Subgraph Diffusion (FASD), a generative approach that samples small subgraphs, injects fairness-aware forward diffusion perturbations guided by a sensitive-attribute predictor, learns score-based models to estimate these perturbations, applies a reverse diffusion to debias subgraphs, and trains standard GNNs on the debiased data. Key contributions include a novel fairness-aware diffusion framework and extensive empirical results demonstrating state-of-the-art fairness (lower $\Delta_{DP}$ and $\Delta_{EO}$) with minimal accuracy loss across NBA and Pokec datasets. The method provides a principled way to mitigate dataset biases in graph-structured data, with practical impact for fair node classification in domains like social networks and recommendation systems.

Abstract

Graph Neural Networks (GNNs) have demonstrated remarkable efficacy in tackling a wide array of graph-related tasks across diverse domains. However, a significant challenge lies in their propensity to generate biased predictions, particularly with respect to sensitive node attributes such as age and gender. These biases, inherent in many machine learning models, are amplified in GNNs due to the message-passing mechanism, which allows nodes to influence each other, rendering the task of making fair predictions notably challenging. This issue is particularly pertinent in critical domains where model fairness holds paramount importance. In this paper, we propose a novel generative Fairness-Aware Subgraph Diffusion (FASD) method for unbiased GNN learning. The method initiates by strategically sampling small subgraphs from the original large input graph, and then proceeds to conduct subgraph debiasing via generative fairness-aware graph diffusion processes based on stochastic differential equations (SDEs). To effectively diffuse unfairness in the input data, we introduce additional adversary bias perturbations to the subgraphs during the forward diffusion process, and train score-based models to predict these applied perturbations, enabling them to learn the underlying dynamics of the biases present in the data. Subsequently, the trained score-based models are utilized to further debias the original subgraph samples through the reverse diffusion process. Finally, FASD induces fair node predictions on the input graph by performing standard GNN learning on the debiased subgraphs. Experimental results demonstrate the superior performance of the proposed method over state-of-the-art Fair GNN baselines across multiple benchmark datasets.

Unbiased GNN Learning via Fairness-Aware Subgraph Diffusion

TL;DR

This work tackles bias in GNN-based node classification arising from graph-based message passing. It introduces Fairness-Aware Subgraph Diffusion (FASD), a generative approach that samples small subgraphs, injects fairness-aware forward diffusion perturbations guided by a sensitive-attribute predictor, learns score-based models to estimate these perturbations, applies a reverse diffusion to debias subgraphs, and trains standard GNNs on the debiased data. Key contributions include a novel fairness-aware diffusion framework and extensive empirical results demonstrating state-of-the-art fairness (lower and ) with minimal accuracy loss across NBA and Pokec datasets. The method provides a principled way to mitigate dataset biases in graph-structured data, with practical impact for fair node classification in domains like social networks and recommendation systems.

Abstract

Graph Neural Networks (GNNs) have demonstrated remarkable efficacy in tackling a wide array of graph-related tasks across diverse domains. However, a significant challenge lies in their propensity to generate biased predictions, particularly with respect to sensitive node attributes such as age and gender. These biases, inherent in many machine learning models, are amplified in GNNs due to the message-passing mechanism, which allows nodes to influence each other, rendering the task of making fair predictions notably challenging. This issue is particularly pertinent in critical domains where model fairness holds paramount importance. In this paper, we propose a novel generative Fairness-Aware Subgraph Diffusion (FASD) method for unbiased GNN learning. The method initiates by strategically sampling small subgraphs from the original large input graph, and then proceeds to conduct subgraph debiasing via generative fairness-aware graph diffusion processes based on stochastic differential equations (SDEs). To effectively diffuse unfairness in the input data, we introduce additional adversary bias perturbations to the subgraphs during the forward diffusion process, and train score-based models to predict these applied perturbations, enabling them to learn the underlying dynamics of the biases present in the data. Subsequently, the trained score-based models are utilized to further debias the original subgraph samples through the reverse diffusion process. Finally, FASD induces fair node predictions on the input graph by performing standard GNN learning on the debiased subgraphs. Experimental results demonstrate the superior performance of the proposed method over state-of-the-art Fair GNN baselines across multiple benchmark datasets.
Paper Structure (32 sections, 19 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 32 sections, 19 equations, 3 figures, 2 tables, 3 algorithms.

Figures (3)

  • Figure 1: Overview of the proposed FASD method. (a) Fairness-Aware Forward Diffusion Process. Subgraphs $\mathcal{G}$ sampled from the input graph $G$ undergo stochastic and fairness-based perturbations within the forward subgraph diffusion process. Score-based models $s_{\theta,t}$ and $s_{\phi,t}$ are trained to approximate these perturbations by minimizing losses $\mathcal{L}_{\theta}$ and $\mathcal{L}_{\phi}$, respectively. (b) Subgraph Debiasing and fair node classification. The sampled subgraphs $\mathcal{G}$ are debiased via reverse diffusion to obtain dense debiased subgraphs $\tilde{\mathcal{G}}$, which are then sparsified during post-processing. Finally, node classification model $f$ is trained on $\tilde{\mathcal{G}}$ to minimize node classification loss $\mathcal{L}$.
  • Figure 2: Sensitivity analysis for our proposed FASD method in terms of Accuracy Vs. $\Delta \text{DP}$ on hyper-parameters $N_{\text{steps}}$ and ($\lambda_X$, $\lambda_A$): (a) $N_{\text{steps}}$, NBA; (b) $N_{\text{steps}}$, Pokec-z; (c) $N_{\text{steps}}$, Pokec-n; (d) ($\lambda_X$, $\lambda_A$), NBA; (e) ($\lambda_X$, $\lambda_A$), Pokec-z; (f) ($\lambda_X$, $\lambda_A$), Pokec-n.
  • Figure 3: Sensitivity analysis for our proposed FASD method in terms of Accuracy Vs. $\Delta \text{EO}$ on hyper-parameters $N_{\text{steps}}$ and ($\lambda_X$, $\lambda_A$): (a) $N_{\text{steps}}$, NBA; (b) $N_{\text{steps}}$, Pokec-z; (c) $N_{\text{steps}}$, Pokec-n; (d) ($\lambda_X$, $\lambda_A$), NBA; (e) ($\lambda_X$, $\lambda_A$), Pokec-z; (f) ($\lambda_X$, $\lambda_A$), Pokec-n.