A Stochastic Block Hypergraph model
Alexis Pister, Marc Barthelemy
TL;DR
The paper addresses the need for simple generative models of hypergraphs with community structure. It proposes a hypergraph generalization of the stochastic block model (SBM) with an explicit, modulable hyperedge-formation process, focusing on $P_{ij}=p\delta_{ij}+q(1-\delta_{ij})$ and four node-joining strategies (weighted, max, min, majority). Through simulations, it shows that degree and hyperedge-size distributions are well approximated by binomials with effective parameters $N^*$ and $E^*$ that depend linearly on $q/p$, and that hyperedge composition, measured by a normalized Gini coefficient, is strongly controlled by the chosen strategy and by $q/p$. The results provide a simple, flexible null model to study community detection, dynamics, and visualization in hypergraphs.
Abstract
The stochastic block model is widely used to generate graphs with a community structure, but no simple alternative currently exists for hypergraphs, in which more than two nodes can be connected together through a hyperedge. We discuss here such a hypergraph generalization, based on the clustering connection probability $P_{ij}$ between nodes of communities $i$ and $j$, and that uses an explicit and modulable hyperedge formation process. We focus on the standard case where $P_{ij}=pδ_{ij}+q(1-δ_{ij})$ when $0\leq q\leq p$. We propose a simple model that satisfies three criteria: it should be as simple as possible, when $p = q$ the model should be equivalent to the standard hypergraph random model, and it should use an explicit and modulable hyperedge formation process so that the model is intuitive and can easily express different real-world formation processes. We first show that for such a model the degree distribution and hyperedge size distribution can be approximated by binomial distributions with effective parameters that depend on the number of communities and $q/p$. Also, the composition of hyperedges goes for $q=0$ from `pure' hyperedges (comprising nodes belonging to the same community) to `mixed' hyperedges that comprise nodes from different communities for $q=p$. We test various formation processes and our results suggest that when they depend on the composition of the hyperedge, they tend to favor the dominant community and lead to hyperedges with a smaller diversity. In contrast, for formation processes that are independent from the hyperedge structure, we obtain hyperedges comprising a larger diversity of communities. The advantages of the model proposed here are its simplicity and flexibility that make it a good candidate for testing community-related problems, such as their detection, impact on various dynamics, and visualization.
