Table of Contents
Fetching ...

Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks

Arjun Subramonian, Jian Kang, Yizhou Sun

TL;DR

It is proved that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained, and it is shown that with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power.

Abstract

Graph Neural Networks (GNNs) often perform better for high-degree nodes than low-degree nodes on node classification tasks. This degree bias can reinforce social marginalization by, e.g., privileging celebrities and other high-degree actors in social networks during social and content recommendation. While researchers have proposed numerous hypotheses for why GNN degree bias occurs, we find via a survey of 38 degree bias papers that these hypotheses are often not rigorously validated, and can even be contradictory. Thus, we provide an analysis of the origins of degree bias in message-passing GNNs with different graph filters. We prove that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained. Moreover, we show that degree bias arises from a variety of factors that are associated with a node's degree (e.g., homophily of neighbors, diversity of neighbors). Furthermore, we show that during training, some GNNs may adjust their loss on low-degree nodes more slowly than on high-degree nodes; however, with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power. Throughout our analysis, we connect our findings to previously-proposed hypotheses for the origins of degree bias, supporting and unifying some while drawing doubt to others. We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias.

Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks

TL;DR

It is proved that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained, and it is shown that with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power.

Abstract

Graph Neural Networks (GNNs) often perform better for high-degree nodes than low-degree nodes on node classification tasks. This degree bias can reinforce social marginalization by, e.g., privileging celebrities and other high-degree actors in social networks during social and content recommendation. While researchers have proposed numerous hypotheses for why GNN degree bias occurs, we find via a survey of 38 degree bias papers that these hypotheses are often not rigorously validated, and can even be contradictory. Thus, we provide an analysis of the origins of degree bias in message-passing GNNs with different graph filters. We prove that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained. Moreover, we show that degree bias arises from a variety of factors that are associated with a node's degree (e.g., homophily of neighbors, diversity of neighbors). Furthermore, we show that during training, some GNNs may adjust their loss on low-degree nodes more slowly than on high-degree nodes; however, with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power. Throughout our analysis, we connect our findings to previously-proposed hypotheses for the origins of degree bias, supporting and unifying some while drawing doubt to others. We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias.
Paper Structure (37 sections, 6 theorems, 37 equations, 18 figures, 4 tables)

This paper contains 37 sections, 6 theorems, 37 equations, 18 figures, 4 tables.

Key Result

Theorem 1

Consider a test node $i \in {\cal V} \setminus S$, with ${\bm{Y}}_i = c$. Furthermore, consider a label $c' \neq c$. Let $\mathbb{P} \left( \ell({\cal M} | i, c) > \ell({\cal M} | i, c') \right)$ be the probability of misclassifying $i$. Then, if $\mathbb{E} \left[{\bm{Z}}_{i, c'}^{(L)} - {\bm{Z}}_{ where the squared inverse coefficient of variation $R_{i, c'} = \frac{\left(\mathbb{E} \left[{\bm{Z

Figures (18)

  • Figure 1: Test loss vs. degree of nodes in CiteSeer for RW, SYM, and ATT GNNs. High-degree nodes generally incur a lower test loss than low-degree nodes do. Error bars are reported over 10 random seeds; all error bars are 1-sigma and represent the standard deviation about the mean.
  • Figure 2: Visual summary of the geometry of representations, variance of representations, and training dynamics of RW, SYM, and ATT GNNs on CiteSeer. We consider low-degree nodes to be the 100 nodes with the smallest degrees and high-degree nodes to be the 100 nodes with the largest degrees. Each point in the plots in the left column corresponds to a test node representation and its color represents the node's class. (In this particular dataset, low-degree nodes are more heavily concentrated in a few classes.) The plots in the left column are based on a single random seed, while the plots in the middle and right columns are based on 10 random seeds. RW representations of low-degree nodes often have a larger variance than high-degree node representations, while SYM representations of low-degree nodes often have a smaller variance. Furthermore, SYM generally adjusts its training loss on low-degree nodes less rapidly.
  • Figure 3: Inverse collision probability vs. degree of nodes in CiteSeer for RW, SYM, and ATT GNNs. Node degrees generally have a strong association with inverse collision probabilities.
  • Figure 4: Test loss vs. degree of nodes in citation and collaboration network datasets for RW, SYM, and ATT GNNs. High-degree nodes generally incur a lower test loss than low-degree nodes do. Error bars are reported over 10 random seeds; all error bars are 1-sigma and represent the standard deviation about the mean.
  • Figure 5: Test loss vs. degree of nodes in online product and Wikipedia network datasets for RW, SYM, and ATT GNNs. High-degree nodes generally incur a lower test loss than low-degree nodes do. Error bars are reported over 10 random seeds; all error bars are 1-sigma and represent the standard deviation about the mean.
  • ...and 13 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Theorem 4
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more