Table of Contents
Fetching ...

Higher-order null models as a lens for social systems

Giulia Preti, Adriano Fazzone, Giovanni Petri, Gianmarco De Francisci Morales

TL;DR

The paper develops two micro-canonical null models for directed hypergraphs, $DHCM$ and $DHJM$, to preserve essential higher-order structural properties and enable principled hypothesis testing on complex systems. It provides two scalable Metropolis-Hastings samplers, $NuDHy-Degs$ and $NuDHy-JOINT$, that sample uniformly from the respective ensembles via edge-swap operations, guaranteeing unbiased null models. Through three interdisciplinary case studies in sociology, epidemiology, and economics, the authors show that preserving joint degree information (JOINT) captures higher-order effects missed by degree-seonly models, and that JOINT is particularly crucial for nonlinear contagion and for reproducing economics rankings. The work demonstrates the practical impact of directed-hypergraph null models for analyzing social systems, informs where local versus meso-/global-scale information matters, and provides open-source tools to enable broader adoption in diverse domains.

Abstract

Despite the widespread adoption of higher-order mathematical structures such as hypergraphs, methodological tools for their analysis lag behind those for traditional graphs. This work addresses a critical gap in this context by proposing two micro-canonical random null models for directed hypergraphs: the Directed Hypergraph Configuration Model (DHCM) and the Directed Hypergraph JOINT Model (DHJM). These models preserve essential structural properties of directed hypergraphs such as node in- and out-degree sequences and hyperedge head and tail size sequences, or their joint tensor. We also describe two efficient MCMC algorithms, NuDHy-Degs and NuDHy-JOINT, to sample random hypergraphs from these ensembles. To showcase the interdisciplinary applicability of the proposed null models, we present three distinct use cases in sociology, epidemiology, and economics. First, we reveal the oscillatory behavior of increased homophily in opposition parties in the US Congress over a 40-year span, emphasizing the role of higher-order structures in quantifying political group homophily. Second, we investigate non-linear contagion in contact hyper-networks, demonstrating that disparities between simulations and theoretical predictions can be explained by considering higher-order joint degree distributions. Last, we examine the economic complexity of countries in the global trade network, showing that local network properties preserved by NuDHy explain the main structural economic complexity indexes. This work advances the development of null models for directed hypergraphs, addressing the intricate challenges posed by their complex entity relations, and providing a versatile suite of tools for researchers across various domains.

Higher-order null models as a lens for social systems

TL;DR

The paper develops two micro-canonical null models for directed hypergraphs, and , to preserve essential higher-order structural properties and enable principled hypothesis testing on complex systems. It provides two scalable Metropolis-Hastings samplers, and , that sample uniformly from the respective ensembles via edge-swap operations, guaranteeing unbiased null models. Through three interdisciplinary case studies in sociology, epidemiology, and economics, the authors show that preserving joint degree information (JOINT) captures higher-order effects missed by degree-seonly models, and that JOINT is particularly crucial for nonlinear contagion and for reproducing economics rankings. The work demonstrates the practical impact of directed-hypergraph null models for analyzing social systems, informs where local versus meso-/global-scale information matters, and provides open-source tools to enable broader adoption in diverse domains.

Abstract

Despite the widespread adoption of higher-order mathematical structures such as hypergraphs, methodological tools for their analysis lag behind those for traditional graphs. This work addresses a critical gap in this context by proposing two micro-canonical random null models for directed hypergraphs: the Directed Hypergraph Configuration Model (DHCM) and the Directed Hypergraph JOINT Model (DHJM). These models preserve essential structural properties of directed hypergraphs such as node in- and out-degree sequences and hyperedge head and tail size sequences, or their joint tensor. We also describe two efficient MCMC algorithms, NuDHy-Degs and NuDHy-JOINT, to sample random hypergraphs from these ensembles. To showcase the interdisciplinary applicability of the proposed null models, we present three distinct use cases in sociology, epidemiology, and economics. First, we reveal the oscillatory behavior of increased homophily in opposition parties in the US Congress over a 40-year span, emphasizing the role of higher-order structures in quantifying political group homophily. Second, we investigate non-linear contagion in contact hyper-networks, demonstrating that disparities between simulations and theoretical predictions can be explained by considering higher-order joint degree distributions. Last, we examine the economic complexity of countries in the global trade network, showing that local network properties preserved by NuDHy explain the main structural economic complexity indexes. This work advances the development of null models for directed hypergraphs, addressing the intricate challenges posed by their complex entity relations, and providing a versatile suite of tools for researchers across various domains.
Paper Structure (27 sections, 8 theorems, 62 equations, 19 figures, 8 tables, 6 algorithms)

This paper contains 27 sections, 8 theorems, 62 equations, 19 figures, 8 tables, 6 algorithms.

Key Result

Lemma 1

Let $G \doteq (L, R, D)$ be a directed bipartite graph and $u \neq v \in L$, $\alpha \neq \beta \in R$ such that $\exists d \in \{+1,-1\}$ for which $e_1 \doteq (u,\alpha,d), e_2 \doteq (v,\beta,d) \in D$ and $e_3 \doteq (u,\beta,d), e_4 \doteq (v,\alpha,d) \notin D$. Swapping $e_1$, $e_2$ with $e_3

Figures (19)

  • Figure 1: Construction of directed hypergraph configuration models.a) A directed hypergraph (top) and its representation as a bipartite graph (bottom). The left vertices (circles) correspond to hypergraph nodes, while the right vertices (hexagons) correspond to hyperedges. Dotted lines in the directed hypergraph separate the head and tail of each hyperedge, with arrows pointing towards the tail. b) The characteristics of the observed hypergraph preserved by DHCM and DHJM: left and right in- and out-degree sequences (top), and JOINT (bottom). The right in-degree sequence corresponds to the head-size sequence, while the right out-degree sequence corresponds to the tail-size sequence.
  • Figure 2: Mean affinity ratios in the US Congress co-sponsored bills. We show results for \ref{['eq:affHone']} divided by the mean values in $33$ samples for NuDHy-Degs and NuDHy-JOINT for the US Senate (S-bills), panel (a)) and House (H-bills), panel (c)). For comparison, we show the values of \ref{['eq:affHone']} divided by \ref{['eq:baseHone']} for Veldt et al. again for the US Senate (panel (b)) and House (panel (d)). The colors indicate Democrats (blue) and Republicans (red). We report the average ratios over $k = 2, \cdots, 14$.
  • Figure 3: Density of infected nodes in contact networks. We show the values of $\rho^*$ in the stationary state of contagion dynamics on the observed hypergraph, and on $33$ samples generated by NuDHy-Degs and NuDHy-JOINT, varying infection rate $\lambda$ and non-linearity parameter $\nu$, for lyon, high, email-Eu, and email-Enron. We report also the output of the AMEs as defined in st2022influential. The infection rate is rescaled with the invasion threshold $\lambda_c$. Errors bars correspond to one standard deviation.
  • Figure 4: Relative competitiveness in hs2019. Panel a): rankings distributions based on ECI, Fitness, and GENEPY across 33 samples for NuDHy-Degs (top) and NuDHy-JOINT (bottom) compared to the observed rankings, with annotated top-$4$ diverging ranks. Panel b): density plots of the KDE of the observed biadjacency matrix $\mathsf{M}$ and of the aggregated matrices across $33$ samples of NuDHy-Degs and NuDHy-JOINT. Countries are sorted by ECI/Fitness and products by PCI/Quality (descending). The lighter the color, the higher the density of edges.
  • Figure 5: a) Bipartite graphs obtained from \ref{['fig:toy']}a after the application of the PSO $(1, , +1), (6, , +1) \xrightarrow{\mathrm{PSO}}\xspace (1, , +1), (6, , +1)$ and of the RPSO $(2, , -1), (5, , -1) \xrightarrow{\mathrm{RPSO}}\xspace (2, , -1), (5, , -1)$. The edges involved in the swap operations are highlighted in red. Left nodes with the same in- and out-degree are outlined with the same color. Right nodes with the same in- and out-degree are outlined with the same pattern. b) Changes in the neighborhood of a left node after the application of a sequence of PSOs and of RPSOs. PSOs preserve the number of in-going and out-going edges of each node. RPSOs preserve also the in- and out-degree of the nodes connected to each node.
  • ...and 14 more figures

Theorems & Definitions (15)

  • Definition 1: JOINT
  • Lemma 1: Parity Swap Operation, PSO
  • Lemma 2: Restricted Parity Swap Operation, RPSO
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5: Lemma 4 czabarka2015realizations
  • proof
  • Corollary 1: Corollary 5 czabarka2015realizations
  • ...and 5 more