Table of Contents
Fetching ...

FairSample: Training Fair and Accurate Graph Convolutional Neural Networks Efficiently

Zicun Cong, Shi Baoxu, Shan Li, Jaewon Yang, Qi He, Jian Pei

TL;DR

This paper presents an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs and develops FairSample, a framework that jointly mitigates the three types of biases.

Abstract

Fairness in Graph Convolutional Neural Networks (GCNs) becomes a more and more important concern as GCNs are adopted in many crucial applications. Societal biases against sensitive groups may exist in many real world graphs. GCNs trained on those graphs may be vulnerable to being affected by such biases. In this paper, we adopt the well-known fairness notion of demographic parity and tackle the challenge of training fair and accurate GCNs efficiently. We present an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs. Our insights lead to FairSample, a framework that jointly mitigates the three types of biases. We employ two intuitive strategies to rectify graph structures. First, we inject edges across nodes that are in different sensitive groups but similar in node features. Second, to enhance model fairness and retain model quality, we develop a learnable neighbor sampling policy using reinforcement learning. To address the bias in node features and model parameters, FairSample is complemented by a regularization objective to optimize fairness.

FairSample: Training Fair and Accurate Graph Convolutional Neural Networks Efficiently

TL;DR

This paper presents an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs and develops FairSample, a framework that jointly mitigates the three types of biases.

Abstract

Fairness in Graph Convolutional Neural Networks (GCNs) becomes a more and more important concern as GCNs are adopted in many crucial applications. Societal biases against sensitive groups may exist in many real world graphs. GCNs trained on those graphs may be vulnerable to being affected by such biases. In this paper, we adopt the well-known fairness notion of demographic parity and tackle the challenge of training fair and accurate GCNs efficiently. We present an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs. Our insights lead to FairSample, a framework that jointly mitigates the three types of biases. We employ two intuitive strategies to rectify graph structures. First, we inject edges across nodes that are in different sensitive groups but similar in node features. Second, to enhance model fairness and retain model quality, we develop a learnable neighbor sampling policy using reinforcement learning. To address the bias in node features and model parameters, FairSample is complemented by a regularization objective to optimize fairness.
Paper Structure (23 sections, 1 theorem, 5 equations, 6 figures, 6 tables)

This paper contains 23 sections, 1 theorem, 5 equations, 6 figures, 6 tables.

Key Result

Theorem 1

Denote by $\mathbf{\mu}_{a} = \frac{1}{|\mathcal{V}_{a}|}\sum_{v \in \mathcal{V}_{a}}\mathbf{x}_v$ the mean of the node feature vectors in a group $\mathcal{V}_{a}$ and by $dev(\mathcal{V}_a) = \max_{v \in \mathcal{V}_a}\{\| v - \mu_{a}\|_{\infty}\}$ the deviation. Let $\delta_a=\max\{dev(\mathcal{V

Figures (6)

  • Figure 1: (a) An input graph to a 2-layer GCN. (b) The original computation graph of node $v_1$ in the 2-layer GCN. (c) The down-sampled computation graph of node $v_1$ in the 2-layer GCN. The embedding of a node in a layer is plotted on the left of the node. The arrows indicate the directions of embedding aggregation.
  • Figure 2: An example showing the intuition of the FairSample approach. (a) An input graph to FairSample. (b) The augmented graph after injecting an inter-group edge. (c) Jointly train the sampling policy $f_{S}$ and the 2-layer GCN node classifier $f_{G}$ with the computation graph of a node $v_1$.
  • Figure 3: The demographic parity and accuracy tradeoff of the models trained by FairSample and GSR. A point closer to the top-right corner is better.
  • Figure 4: The accuracy and $\Delta DP$ convergence curves of FairSample, FGAT, NIFTY, GSR, and PASSR across training epochs. For the sake of display clarity, we report the results of PASSR and GSR separately from the other baselines in Figures \ref{['figure:acc_convergence_pokec_z_pass']} and \ref{['figure:dp_convergence_pokec_z_pass']}.
  • Figure 5: The training time (in second) and GPU memory usage (in MB) of FairSample and the baselines on the PZG dataset. The bars in the figures are sorted in ascending order from left to right based on their values. The y-axis is in logarithmic scale.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Example 1: GCN Classifier and Computation Graph
  • Definition 1: Demographic Parity DBLP:conf/icml/AgarwalBD0W18
  • Theorem 1
  • proof
  • Example 2: Phase 1
  • Example 3: Phase 2
  • Example 4