Table of Contents
Fetching ...

A Stochastic Block Hypergraph model

Alexis Pister, Marc Barthelemy

TL;DR

The paper addresses the need for simple generative models of hypergraphs with community structure. It proposes a hypergraph generalization of the stochastic block model (SBM) with an explicit, modulable hyperedge-formation process, focusing on $P_{ij}=p\delta_{ij}+q(1-\delta_{ij})$ and four node-joining strategies (weighted, max, min, majority). Through simulations, it shows that degree and hyperedge-size distributions are well approximated by binomials with effective parameters $N^*$ and $E^*$ that depend linearly on $q/p$, and that hyperedge composition, measured by a normalized Gini coefficient, is strongly controlled by the chosen strategy and by $q/p$. The results provide a simple, flexible null model to study community detection, dynamics, and visualization in hypergraphs.

Abstract

The stochastic block model is widely used to generate graphs with a community structure, but no simple alternative currently exists for hypergraphs, in which more than two nodes can be connected together through a hyperedge. We discuss here such a hypergraph generalization, based on the clustering connection probability $P_{ij}$ between nodes of communities $i$ and $j$, and that uses an explicit and modulable hyperedge formation process. We focus on the standard case where $P_{ij}=pδ_{ij}+q(1-δ_{ij})$ when $0\leq q\leq p$. We propose a simple model that satisfies three criteria: it should be as simple as possible, when $p = q$ the model should be equivalent to the standard hypergraph random model, and it should use an explicit and modulable hyperedge formation process so that the model is intuitive and can easily express different real-world formation processes. We first show that for such a model the degree distribution and hyperedge size distribution can be approximated by binomial distributions with effective parameters that depend on the number of communities and $q/p$. Also, the composition of hyperedges goes for $q=0$ from `pure' hyperedges (comprising nodes belonging to the same community) to `mixed' hyperedges that comprise nodes from different communities for $q=p$. We test various formation processes and our results suggest that when they depend on the composition of the hyperedge, they tend to favor the dominant community and lead to hyperedges with a smaller diversity. In contrast, for formation processes that are independent from the hyperedge structure, we obtain hyperedges comprising a larger diversity of communities. The advantages of the model proposed here are its simplicity and flexibility that make it a good candidate for testing community-related problems, such as their detection, impact on various dynamics, and visualization.

A Stochastic Block Hypergraph model

TL;DR

The paper addresses the need for simple generative models of hypergraphs with community structure. It proposes a hypergraph generalization of the stochastic block model (SBM) with an explicit, modulable hyperedge-formation process, focusing on and four node-joining strategies (weighted, max, min, majority). Through simulations, it shows that degree and hyperedge-size distributions are well approximated by binomials with effective parameters and that depend linearly on , and that hyperedge composition, measured by a normalized Gini coefficient, is strongly controlled by the chosen strategy and by . The results provide a simple, flexible null model to study community detection, dynamics, and visualization in hypergraphs.

Abstract

The stochastic block model is widely used to generate graphs with a community structure, but no simple alternative currently exists for hypergraphs, in which more than two nodes can be connected together through a hyperedge. We discuss here such a hypergraph generalization, based on the clustering connection probability between nodes of communities and , and that uses an explicit and modulable hyperedge formation process. We focus on the standard case where when . We propose a simple model that satisfies three criteria: it should be as simple as possible, when the model should be equivalent to the standard hypergraph random model, and it should use an explicit and modulable hyperedge formation process so that the model is intuitive and can easily express different real-world formation processes. We first show that for such a model the degree distribution and hyperedge size distribution can be approximated by binomial distributions with effective parameters that depend on the number of communities and . Also, the composition of hyperedges goes for from `pure' hyperedges (comprising nodes belonging to the same community) to `mixed' hyperedges that comprise nodes from different communities for . We test various formation processes and our results suggest that when they depend on the composition of the hyperedge, they tend to favor the dominant community and lead to hyperedges with a smaller diversity. In contrast, for formation processes that are independent from the hyperedge structure, we obtain hyperedges comprising a larger diversity of communities. The advantages of the model proposed here are its simplicity and flexibility that make it a good candidate for testing community-related problems, such as their detection, impact on various dynamics, and visualization.
Paper Structure (8 sections, 11 equations, 12 figures)

This paper contains 8 sections, 11 equations, 12 figures.

Figures (12)

  • Figure 1: Bipartite representation of a hypergraph. If a node belongs to a hyperedge, it is indicated by a link in this representation.
  • Figure 2: Benchmark of the time (in seconds) needed to run the algorithm (a) for different values of N, with $N=E$, $K=4$, $p=\frac{100}{N}$, $q=0.4p$, using the $weighted$ strategy (each community is of the same size), and a quadratic fit (b) that shows the quadratic complexity of the algorithm ($f(N)=aN^2+bN+c$ with $a=1.01 10^{-6}$, $b=-9.8 10^{-4}$, $c=0.34$).
  • Figure 3: Degree (a) and hyperedge size (b) distributions computed from $100$ hypergraphs with $N=1000$, $E=200$, $K=4$, $p = \frac{30}{N} = 0.03$, and $q/p = 0, 0.3, 0.7, 1$. The strategy used is the weighted probability (Eq. \ref{['eq:weighted']}). The distributions are fitted with binomial distributions of parameters $(p, N^*)$.
  • Figure 4: Effective normalized parameters (top: $N^*/N$, and bottom: $E^*/E$) obtained by fitting with binomials the degree and size distributions. The distributions are computed using $100$ hypergraphs generated with $N=1000$, $E=200$, $K=4$, $p = \frac{30}{N}=0.03$ and the weighted formation process. The lines represent linear fits of the form $a(q/p)+b$ where $a=0.73$ and $b=0.27$ for both plots, in agreement with our argument (Eq. \ref{['eq:argu']}).
  • Figure 5: Hyperedge size distribution computed from $100$ hypergraphs with $N=1000$, $E=200$, $K=4$$p = 0.1$, and $q/p = 0.1, 0.01$, with the $min$ strategy.
  • ...and 7 more figures