Perplexity-Homophily Index: Homophily through Diversity in Hypergraphs
Gaurav Kumar, Akrati Saxena, Chandrakala Meena
TL;DR
The paper addresses measuring homophily in higher-order networks modeled as hypergraphs by introducing an edge-centric framework based on interaction perplexity $D(e)$ and a degree-aware baseline $B_{|e|}$. Homophily is quantified as a normalized diversity gap $\phi(e)=\frac{B_{|e|}-D(e)}{B_{|e|}-1}$ and aggregated into the Perplexity-Homophily Index $\Phi(H)=\frac{1}{|E|}\sum_{e\in E}\phi(e)$, with a $k$-uniform extension $\Phi(H_k)$ that connects to Newman's assortativity for $k=2$. The method is validated on synthetic and real-world hypergraphs, showing that $\Phi(H)$ captures the full distribution of homophily and reveals how homophilic and heterophilic tendencies vary with interaction size across domains such as shopping, politics, and education. This framework offers a flexible, interpretable, and comparable measure for higher-order homophily and lays the groundwork for temporal, multilayer, and model-based extensions in complex systems.
Abstract
Real-world complex systems are often better modeled as hypergraphs, where edges represent group interactions involving multiple entities. Understanding and quantifying homophily (similarity-driven association) in such networks is essential for analyzing community formation and information flow. We propose a hyperedge-centric framework to quantify homophily in hypergraphs. Each interaction is represented as a hyperedge, and its interaction perplexity measures the effective number of distinct attributes it contains. Comparing this observed perplexity with a degree-preserving random baseline defines the diversity gap, which quantifies how diverse an interaction is than expected by chance. The global homophily score for a network, called Perplexity-Homophily Index, is computed by averaging the normalized diversity gap across all hyperedges. Experiments on synthetic and real-world datasets show that the proposed index captures the full distribution of homophily and reveals how homophilic and heterophilic tendencies vary with interaction size in hypergraphs.
