Table of Contents
Fetching ...

Homophily Within and Across Groups

Abbas K. Rizi, Riccardo Michielan, Clara Stegehuis, Mikko Kivelä

TL;DR

The paper addresses the problem that conventional models collapse homophily to a single parameter, obscuring scale-specific mixing patterns. It introduces a maximum-entropy framework that represents multiscale homophily via layered clique subnetworks and yields an exponential-family form for the clique-type distributions, enabling principled inference of homophily at each group size. Empirical validation across diverse networks shows the approach faithfully captures clique-level mixing and reveals how within- and across-group homophily differently impact percolation and epidemic thresholds. The framework provides a robust null model for analyzing diffusion and intervention strategies, with extensions to multidimensional homophily and degree heterogeneity for broader applicability.

Abstract

Homophily -- the tendency of individuals to interact with similar others -- shapes how networks form and function. Yet existing approaches typically collapse homophily to a single scale, either one parameter for the whole network or one per community, thereby detaching it from other structural features. Here, we introduce a maximum-entropy random graph model that moves beyond these limits, capturing homophily across all social scales in the network, with parameters for each group size. The framework decomposes homophily into within- and across-group contributions, recovering the stochastic block model as a special case. As an exponential-family model, it fits empirical data and enables inference of group-level variation of homophily that aggregate metrics miss. The group-dependence of homophily substantially impacts network percolation thresholds, altering predictions for epidemic spread, information diffusion, and the effectiveness of interventions. Ignoring such heterogeneity risks systematically misjudging connectivity and dynamics in complex systems.

Homophily Within and Across Groups

TL;DR

The paper addresses the problem that conventional models collapse homophily to a single parameter, obscuring scale-specific mixing patterns. It introduces a maximum-entropy framework that represents multiscale homophily via layered clique subnetworks and yields an exponential-family form for the clique-type distributions, enabling principled inference of homophily at each group size. Empirical validation across diverse networks shows the approach faithfully captures clique-level mixing and reveals how within- and across-group homophily differently impact percolation and epidemic thresholds. The framework provides a robust null model for analyzing diffusion and intervention strategies, with extensions to multidimensional homophily and degree heterogeneity for broader applicability.

Abstract

Homophily -- the tendency of individuals to interact with similar others -- shapes how networks form and function. Yet existing approaches typically collapse homophily to a single scale, either one parameter for the whole network or one per community, thereby detaching it from other structural features. Here, we introduce a maximum-entropy random graph model that moves beyond these limits, capturing homophily across all social scales in the network, with parameters for each group size. The framework decomposes homophily into within- and across-group contributions, recovering the stochastic block model as a special case. As an exponential-family model, it fits empirical data and enables inference of group-level variation of homophily that aggregate metrics miss. The group-dependence of homophily substantially impacts network percolation thresholds, altering predictions for epidemic spread, information diffusion, and the effectiveness of interventions. Ignoring such heterogeneity risks systematically misjudging connectivity and dynamics in complex systems.

Paper Structure

This paper contains 17 sections, 28 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Constructing a Homophilic Clique Network with Eight Red and Seven Blue Nodes. With two colors, each $c$-clique can appear in $c+1$ different compositions, distributed according to ${F}_c$. (a-c) The network $G$ is constructed by merging multiple clique layers—specifically, 2-, 3-, and 4-cliques—where each layer $G_c$ contains cliques of size $c$ but shares the same set of nodes. While the nodes remain fixed, each layer differs in clique size $c$ and composition distribution ${F}_c$. The accompanying histograms show the frequency of each $c$-clique type in the network $G_c$. In each layer, $M_c$ groups of $c$ nodes are sampled with replacement from the available colored nodes and converted into $c$-cliques. For example, $G_3$ consists of seven sampled 3-cliques, while $G_2$ is constructed similarly, but with its own parameters $M_2$ and ${F}_2$. (d) All layers are then merged to form the final network: $G = \bigcup G_c$. (e) The maximum-entropy clique composition distribution ${F}_5$, derived from Eq. \ref{['eq:max_ent_f']}, is illustrated for cliques of size $c = 5$ at homophily levels $h_5 =0.0, h_5= 0.2$ and $h_5 = 0.6$, across several values of the red-node fraction $n_\mathrm{r}$. For clarity, we omit the corresponding networks generated from these distributions. More details in Sec. \ref{['sec:net_model']}.
  • Figure 2: Empirical and Theoretical Distributions of Clique Types and Homophily in Real-world Networks mentioned in Sec. \ref{['sec:data']}.(a) Each column compares empirical clique-type distributions (histograms) with the corresponding maximum-entropy distributions (dashed lines, computed using Eq. \ref{['eq:max_ent_f']}), showing strong agreement and validating the model’s accuracy in capturing observed clique compositions across real-world datasets. Red represents males and blue, females. (b) Variation in homophily values $h_c$ across clique sizes (up to $c = 10$) for several empirical networks, highlighting distinct group interaction patterns based on sex attributes. Error bars represent 95% confidence intervals obtained via bootstrap resampling. (c–e) Average homophily trends across clique sizes for Facebook friendship networks from 100 U.S. institutions, with attributes grouped by: (c) Sex (two categories), (d) Student status (two categories), and (e) Class year (12 categories). Gray background curves represent individual institutions, while colored lines indicate the average trends. Lighter colors reflect a greater number of contributing institutions, emphasizing both inter-institutional variability and how homophily depends on the attribute under consideration.
  • Figure 3: Within- and across-group Homophily Influences Network Connectivity, Percolation Properties and Epidemic Thresholds in Non-Uniform Ways. Panels (a–c) present heatmaps of the critical percolation value with respect to $h=(h_2 + h_4)/2$ and $\Delta h = h_2 - h_4$ for: (a)$\pi_\mathrm{rr} = \pi_\mathrm{bb}$, $\pi_\mathrm{rb} = 0.1$, and $\alpha_2=0.5$; (b)$\pi_\mathrm{rr} = \pi_\mathrm{bb}$, $\pi_\mathrm{rb} = 0.5$, and $\alpha_2=0.5$; (c)$\pi_\mathrm{rr} = 0.1\pi_\mathrm{bb}$$\pi_\mathrm{rb} = 0.1 , \alpha_2=0.2$. These panels illustrate how redistributing homophily configurations can either raise or lower the percolation threshold depending on the extent of across-group connectivity $\pi_{rb}$. Panel (d) compares the effect of redistributing homophily across small and large groups in networks with $h = 0.5$, average degree 2, and $\alpha_2=0.5$. Green regions indicate that emphasizing small-group homophily ($h_4 = 0.1$, $h_2 = 0.9$) leads to a higher percolation threshold than emphasizing large-group homophily ($h_2 = 0.1$, $h_4 = 0.9$). No percolation transition occurs in the top-right white region. Panels (e–g) explore the interaction between homophily and vaccination efficacy with respect to $h=(h_2 + h_4)/2$ and $\Delta h = h_2 - h_4$. The impact of homophily depends on both within-group efficacy ($f_v$) and cross-group efficacy ($f_I$): (e)$f_I = 1$, $f_v = 0.1$; (f)$f_I = 0.1$, $f_v = 0.5$; (g)$f_I = 0.1$, $f_v = 1$.