Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Arjun Subramonian; Levent Sagun; Yizhou Sun

Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Arjun Subramonian, Levent Sagun, Yizhou Sun

TL;DR

Light is shed on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction and a simple training-time strategy to alleviate within-group unfairness is proposed, and it is shown that it is effective on citation, social, and credit networks.

Abstract

Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group (e.g., queer women) fairness and "rich get richer" dynamics of link prediction remain underexplored. However, these aspects have significant consequences for degree and power imbalances in networks. In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction. In particular, we theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group preferential attachment bias. We validate our theoretical analysis on real-world citation, collaboration, and online social networks. We further bridge GCN's preferential attachment bias with unfairness in link prediction and propose a new within-group fairness metric. This metric quantifies disparities in link prediction scores within social groups, towards combating the amplification of degree and power disparities. Finally, we propose a simple training-time strategy to alleviate within-group unfairness, and we show that it is effective on citation, social, and credit networks.

Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

TL;DR

Abstract

Paper Structure (45 sections, 5 theorems, 30 equations, 12 figures, 5 tables)

This paper contains 45 sections, 5 theorems, 30 equations, 12 figures, 5 tables.

Introduction
Related Work
Degree Bias in GNNs
Fair Link Prediction
Within-Group Fairness
Bias and Power in Networks
Preliminaries
Theoretical Analysis
Symmetric Normalized Filter
Within-Group Fairness
Random Walk Normalized Filter
Fairness Regularizer
Experiments
Validating Theoretical Analysis
Within-Group Fairness
...and 30 more sections

Key Result

Lemma 4.1

Similarly to Xu2018RepresentationLO, assume that each path from node $i \to j$ in the computation graph of $\Phi_s$ is independently activated with probability $\rho_s (i)$, and similarly, $\rho_r (i)$ for $\Phi_r$ (cf. §sec:taylor-lemma-comments). Furthermore, suppose that $\mathop{\mathbb{E}} \lef

Figures (12)

Figure 1: An academic collaboration network where nodes are Computer Science (CS) and Education (Edu) researchers, solid edges are current or past collaborations, and dashed edges are collaborations recommended by a GCN. Circular nodes are women and square nodes are men.
Figure 2: The plots display the theoretic vs. GCN LP scores for the Cora, CS, and LastFMAsia datasets over 10 random seeds. (We include the plots for the remaining datasets in §\ref{['sec:remaining-plots']}.) The top row of plots corresponds to $\Phi_s$, the bottom row to $\Phi_r$. In the plots, each circle corresponds to a single pair of test nodes (between which we are predicting a link). The center of each circle represents the mean of the theoretic and GCN scores and its area captures the range of scores. The color of each circle indicates the social group to which the node pair belongs. The plots include: (1) the total number of test node pairs $N$; (2) the number of social groups $B$; (3) the dashed line of equality for easy comparison of the theoretic and GCN scores. For all the datasets, the tables display: (1) the mean/standard deviation of the GCN test AUC on LP; and (2) the mean/standard deviation of the range-normalizedroot-mean-square deviation (NRMSE) otto2019rmse and Pearson correlation coefficient (PCC) freedman2007statistics of the theoretic LP scores as predictors of the GCN scores. The left table corresponds to $\Phi_s$, the right to $\Phi_r$.
Figure 3: The plots display $\widehat{\Delta}^{(b)}$ vs. $\Delta^{(b)}$ for $\Phi_s$ for the NBA, German, and DBLP-Fairness datasets over all $b \in [B]$ and 10 random seeds. Each point corresponds to a different random seed, and the color of the point corresponds to the social group $S^{(b)}$. We compute $\widehat{\Delta}^{(b)}$ and $\Delta^{(b)}$ post-sigmoid using only the LP scores over the sampled (positive and negative) test edges. The plots display the NRMSE and PCC of $\widehat{\Delta}^{(b)}$ as a predictor of $\Delta^{(b)}$.
Figure 4: Theoretic vs. GCN LP scores for citation network datasets.
Figure 5: Theoretic vs. GCN LP scores for collaboration network datasets.
...and 7 more figures

Theorems & Definitions (10)

Lemma 4.1
Lemma 4.2
Theorem 4.3
Theorem 4.4
proof
proof
proof
Lemma 1.1
proof
proof

Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

TL;DR

Abstract

Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (10)