Table of Contents
Fetching ...

The Erdős-Rényi Random Graph Conditioned on Every Component Being a Clique

Martijn Gösgens, Lukas Lüchtrath, Elena Magnanini, Marc Noy, Élie de Panafieu

TL;DR

This paper analyzes CG_{n,p}, the ER graph conditioned to have only clique components, thereby inducing a prior over vertex partitions (communities). Using analytic combinatorics, generating functions, and saddle-point methods, it establishes a phase transition at p=1/2: the graph becomes a single clique for p>1/2, while for p≤1/2 the graph decomposes into many small cliques with precise limiting laws for the number of clusters, edges, and vertex degrees. It also derives a full set of exact generating-function expressions and limit theorems across the critical, subcritical, and near-critical regimes, including sparse settings, and connects these results to Bayesian community-detection methods by showing modularity maximization corresponds to posterior inference under CG-based priors. The near-critical window and sparse-regime analyses illuminate how small changes in p dramatically alter the partition structure, with implications for selecting priors in Bayesian community detection. Overall, the work provides a rigorous probabilistic and combinatorial framework for understanding cluster-graph conditioning of ER graphs and their use in modularity-based inference.

Abstract

Motivated by an application in community detection, we consider an \ER random graph conditioned on the rare event that all connected components are fully connected. Such graphs can be considered as partitions of vertices into cliques. Hence, this conditional distribution defines a distribution over partitions. We show that a popular community detection method is equivalent to Bayesian inference with this distribution as prior over the community partitions. Using tools from analytic combinatorics, we prove limit theorems for several graph observables in this conditional distribution: the number of cliques; the number of edges; and the degree distribution. We consider several regimes of the connection probability $p$ as the number of vertices $n$ diverges. For $p=\tfrac{1}{2}$, the conditioning yields the uniform distribution over set partitions, which is well-studied, but has not been studied as a graph distribution before. For $p<\tfrac{1}{2}$, we show that the number of cliques is of the order $n/\sqrt{\log n}$, while for $p>\tfrac{1}{2}$, we prove that the graph consists of a single clique with high probability. This shows that there is a phase transition at $p=\tfrac{1}{2}$. We additionally study the near-critical regime $p_n\downarrow\tfrac{1}{2}$, as well as the sparse regime $p_n\downarrow0$. Finally, we discuss the implications of these results for community detection.

The Erdős-Rényi Random Graph Conditioned on Every Component Being a Clique

TL;DR

This paper analyzes CG_{n,p}, the ER graph conditioned to have only clique components, thereby inducing a prior over vertex partitions (communities). Using analytic combinatorics, generating functions, and saddle-point methods, it establishes a phase transition at p=1/2: the graph becomes a single clique for p>1/2, while for p≤1/2 the graph decomposes into many small cliques with precise limiting laws for the number of clusters, edges, and vertex degrees. It also derives a full set of exact generating-function expressions and limit theorems across the critical, subcritical, and near-critical regimes, including sparse settings, and connects these results to Bayesian community-detection methods by showing modularity maximization corresponds to posterior inference under CG-based priors. The near-critical window and sparse-regime analyses illuminate how small changes in p dramatically alter the partition structure, with implications for selecting priors in Bayesian community detection. Overall, the work provides a rigorous probabilistic and combinatorial framework for understanding cluster-graph conditioning of ER graphs and their use in modularity-based inference.

Abstract

Motivated by an application in community detection, we consider an \ER random graph conditioned on the rare event that all connected components are fully connected. Such graphs can be considered as partitions of vertices into cliques. Hence, this conditional distribution defines a distribution over partitions. We show that a popular community detection method is equivalent to Bayesian inference with this distribution as prior over the community partitions. Using tools from analytic combinatorics, we prove limit theorems for several graph observables in this conditional distribution: the number of cliques; the number of edges; and the degree distribution. We consider several regimes of the connection probability as the number of vertices diverges. For , the conditioning yields the uniform distribution over set partitions, which is well-studied, but has not been studied as a graph distribution before. For , we show that the number of cliques is of the order , while for , we prove that the graph consists of a single clique with high probability. This shows that there is a phase transition at . We additionally study the near-critical regime , as well as the sparse regime . Finally, we discuss the implications of these results for community detection.
Paper Structure (55 sections, 41 theorems, 379 equations, 6 figures, 2 tables)

This paper contains 55 sections, 41 theorems, 379 equations, 6 figures, 2 tables.

Key Result

Theorem 1.1

Consider the random cluster graph $\mathbf{CG}_{n,p}$ on $n\in\mathbb{N}$ vertices and ER edge probability $p\in(0,1)$ and the number of its clusters $\mathbf{C}_{n,p}$.

Figures (6)

  • Figure 1: Samples of three random cluster graphs on $50$ vertices. The one on the left is sampled with $p=0.25$, the middle one with $p=0.51$ and the right one with $p=0.53$, resulting in the complete graph.
  • Figure 2: We plot the critical sequence $p_n(1/2)$ together with the upper and lower bounds from Lemmas \ref{['lem:critical-lower']} and \ref{['lem:critical-upper']}. The background's colour is based on $\mathbb{P}(\mathbf{C}_{n,p}=1)$, underlining the narrowness of the critical window.
  • Figure 3: For $n\in\{3,30\}$, we show the distribution of $\mathbf{S}_{n,1/n}$ to demonstrate the convergence proven in \ref{['thm:mainDegreeSparse']}. $\rho=\frac{1+\sqrt{5}}{2}$ is the golden ratio.
  • Figure 4: We generate 20 PPMs with $n=1000$ vertices divided into 5 communities of size $200$ each. The parameters $p_{\text{in}}$ and $p_{\text{out}}$ are chosen such that each vertex has (in expectation) 10 neighbors inside its community and 10 neighbors outside its community. We detect communities by maximising ERM with resolution parameter $\gamma(p_{\text{in}},p_{\text{out}},p)$ as given in \ref{['eq:bayesian-modularity']}, for various values of $p$. Figure \ref{['fig:community-detection-granularity']} shows the number of edges in the detected cluster graph, while Figure \ref{['fig:community-detection-performance']} shows the correlation coefficient gosgens2021systematic between the detected and true partitions, which is a measure of similarity between these partitions.
  • Figure 5: For $n\in\{100,500\}$, we show the expected number of cliques of each size at criticality ($p=p_n(1/2)$).
  • ...and 1 more figures

Theorems & Definitions (79)

  • Theorem 1.1: Number of clusters in the RCG
  • Theorem 1.2: Number of edges in the RCG
  • Theorem 1.3: Degree distribution of the RCG
  • Corollary 1.4
  • Theorem 1.5: Near-critical regime
  • Theorem 1.6: Degrees in the sparse regime
  • Theorem 1.7: Fixed degrees in sparse regimes
  • Theorem 1.8
  • proof
  • Lemma 2.1
  • ...and 69 more