Unveiling the Impact of Local Homophily on GNN Fairness: In-Depth Analysis and New Benchmarks
Donald Loveland, Danai Koutra
TL;DR
The paper addresses fairness in graph neural networks under local, as opposed to global, homophily by formalizing an out-of-distribution (OOD) local-homophily problem and linking it to disparate treatment across sensitive attributes. It provides a theoretical framework showing how nearby OOD shifts in local homophily can widen gaps in predicted logits between groups, and it validates these insights with three new real-world fairness benchmarks and a semi-synthetic graph generator that precisely controls local homophily distributions via optimal transport. Empirically, the work demonstrates that fairness degradation correlates with OOD distance (EMD between train/test local-homophily distributions) and the presence of heterophilous nodes in homophilous graphs, with observed SP drops up to about 24% on real data and 30% on semi-synthetic data. The contributions yield practical benchmarks and tools to study and mitigate a previously overlooked fairness risk arising from a graph’s local structure, guiding the development of GNNs that are fair across diverse local connectivity patterns.
Abstract
Graph Neural Networks (GNNs) often struggle to generalize when graphs exhibit both homophily (same-class connections) and heterophily (different-class connections). Specifically, GNNs tend to underperform for nodes with local homophily levels that differ significantly from the global homophily level. This issue poses a risk in user-centric applications where underrepresented homophily levels are present. Concurrently, fairness within GNNs has received substantial attention due to the potential amplification of biases via message passing. However, the connection between local homophily and fairness in GNNs remains underexplored. In this work, we move beyond global homophily and explore how local homophily levels can lead to unfair predictions. We begin by formalizing the challenge of fair predictions for underrepresented homophily levels as an out-of-distribution (OOD) problem. We then conduct a theoretical analysis that demonstrates how local homophily levels can alter predictions for differing sensitive attributes. We additionally introduce three new GNN fairness benchmarks, as well as a novel semi-synthetic graph generator, to empirically study the OOD problem. Across extensive analysis we find that two factors can promote unfairness: (a) OOD distance, and (b) heterophilous nodes situated in homophilous graphs. In cases where these two conditions are met, fairness drops by up to 24% on real world datasets, and 30% in semi-synthetic datasets. Together, our theoretical insights, empirical analysis, and algorithmic contributions unveil a previously overlooked source of unfairness rooted in the graph's homophily information.
