Revisiting Graph Homophily Measures
Mikhail Mironov, Liudmila Prokhorenkova
TL;DR
This work tackles the challenge of comparing graph homophily across datasets with varying class distributions by proposing unbiased homophily, a scale-invariant, edge-wise measure that satisfies a comprehensive set of desirable properties. It formalizes the measure via the normalized class adjacency matrix $C_G$, provides a theoretically grounded definition $h_{unb}$ (and a parametric version $h_{unb}^eta$), and proves continuity and baseline properties with clear limits for $R_{max}$, $R_{base}$, and $R_{min}$. The authors present an equivalence-friendly computation, extend the framework to weighted graphs, and empirically demonstrate that $h_{unb}$ behaves consistently across synthetic and real datasets where existing measures falter. They further show that in directed graphs, the property list becomes contradictory, suggesting a need to revise the property set for directed settings. Overall, the proposed unbiased homophily provides a robust, comparable tool for assessing homophily across diverse graph datasets and lays groundwork for future exploration in directed and higher-order networks.
Abstract
Homophily is a graph property describing the tendency of edges to connect similar nodes. There are several measures used for assessing homophily but all are known to have certain drawbacks: in particular, they cannot be reliably used for comparing datasets with varying numbers of classes and class size balance. To show this, previous works on graph homophily suggested several properties desirable for a good homophily measure, also noting that no existing homophily measure has all these properties. Our paper addresses this issue by introducing a new homophily measure - unbiased homophily - that has all the desirable properties and thus can be reliably used across datasets with different label distributions. The proposed measure is suitable for undirected (and possibly weighted) graphs. We show both theoretically and via empirical examples that the existing homophily measures have serious drawbacks while unbiased homophily has a desirable behavior for the considered scenarios. Finally, when it comes to directed graphs, we prove that some desirable properties contradict each other and thus a measure satisfying all of them cannot exist.
