Table of Contents
Fetching ...

Bayesian nonparametric modeling of heterogeneous populations of networks

Francesco Barile, Simón Lunagómez, Bernardo Nipoti

TL;DR

This work proposes a novel Bayesian nonparametric model that identifies clusters of networks characterized by similar connectivity patterns and demonstrates that this model has full support in the Kullback--Leibler sense and is strongly consistent.

Abstract

The increasing availability of multiple network data has highlighted the need for statistical models for heterogeneous populations of networks. A convenient framework makes use of metrics to measure similarity between networks. In this context, we propose a novel Bayesian nonparametric model that identifies clusters of networks characterized by similar connectivity patterns. Our approach relies on a location-scale Dirichlet process mixture of centered Erdős--Rényi kernels, with components parametrized by a unique network representative, or mode, and a univariate measure of dispersion around the mode. We demonstrate that this model has full support in the Kullback--Leibler sense and is strongly consistent. An efficient Markov chain Monte Carlo scheme is proposed for posterior inference and clustering of multiple network data. The performance of the model is validated through extensive simulation studies, showing improvements over state-of-the-art methods. Additionally, we present an effective strategy to extend the application of the proposed model to datasets with a large number of nodes. We illustrate our approach with the analysis of human brain network data.

Bayesian nonparametric modeling of heterogeneous populations of networks

TL;DR

This work proposes a novel Bayesian nonparametric model that identifies clusters of networks characterized by similar connectivity patterns and demonstrates that this model has full support in the Kullback--Leibler sense and is strongly consistent.

Abstract

The increasing availability of multiple network data has highlighted the need for statistical models for heterogeneous populations of networks. A convenient framework makes use of metrics to measure similarity between networks. In this context, we propose a novel Bayesian nonparametric model that identifies clusters of networks characterized by similar connectivity patterns. Our approach relies on a location-scale Dirichlet process mixture of centered Erdős--Rényi kernels, with components parametrized by a unique network representative, or mode, and a univariate measure of dispersion around the mode. We demonstrate that this model has full support in the Kullback--Leibler sense and is strongly consistent. An efficient Markov chain Monte Carlo scheme is proposed for posterior inference and clustering of multiple network data. The performance of the model is validated through extensive simulation studies, showing improvements over state-of-the-art methods. Additionally, we present an effective strategy to extend the application of the proposed model to datasets with a large number of nodes. We illustrate our approach with the analysis of human brain network data.

Paper Structure

This paper contains 30 sections, 2 theorems, 53 equations, 18 figures, 6 tables, 1 algorithm.

Key Result

Theorem 2.1

The prior $\Pi$ induced by a location-scale DP mixture of CER kernels with base measure as in eq:base_meas has the Kullback--Leibler property. That is, for any $p_* \in \mathcal{P}_{\mathscr{G}_{\mathcal{V}}}$ and any $\varepsilon>0$, $\Pi\left( \mathbb{B}_{\varepsilon}(p_*)\right) > 0$.

Figures (18)

  • Figure 1: Top-down projection of a sample of six network observations extracted from the Human Brain Networks dataset (see \ref{['sec:section5']} for details). The nodes of each network are colored according to the network cluster assignments, as inferred by the proposed method.
  • Figure 2: Left panel: probability $p_{lij}$ in \ref{['eq:prob_edge1']}, with $A_{\mathcal{G}_0[ij]}+A_{\mathcal{G}_l[ij]}\in\{0,1,2\}$ (blue for 0, gray for 1, and yellow for 2) and for $\alpha_l$ ranging in $(0,1/2)$. Right panel: probability $p_{kij}^*$ in \ref{['eq:prob_edgek']}, with $n_{ij}^{(k)}\in\{0,1,\ldots,n_k+1\}$, $n_k+1=10$, (blue for low, yellow for high) and for $\alpha_k^*$ ranging in $(0,1/2)$.
  • Figure 3: Top row: centroids with Scale-free ($\mathcal{C}_{01}$), Small-world ($\mathcal{C}_{02}$), Stochastic Block Model ($\mathcal{C}_{03}$), and Erdős--Rényi ($\mathcal{C}_{04}$) structures (from left to right). Bottom row: posterior Fréchet means for the four clusters estimated based on a dataset generated from the mixed level of variability scenario, with sample size $n=40$. See \ref{['sec:section4']} .
  • Figure 4: Adjusted Rand index, entropy and purity, for our method (yellow violins) and the methods of durante2017 (blue violins), anastasia (cyan violins), Signorelli (green violins) and josephs2025 (violet violins). Columns refer to the scenarios of \ref{['tab:table_var']}. Distributions are estimated based on the analysis of 100 datasets. See \ref{['sec:study1']} .
  • Figure 5: Importance-sampling approximate distributions of $\text{KL}(p_*;\hat{f})$ for our method (yellow violins), and the methods of durante2017 (blue violins), anastasia (cyan violins) and Signorelli (green violins). Distributions are estimated based on the analysis of 100 datasets. See \ref{['subsec:sim_cons']} .
  • ...and 13 more figures

Theorems & Definitions (6)

  • Definition 2.1: Location-scale DP mixture of CER kernels
  • Remark 1
  • Theorem 2.1
  • Corollary 2.1
  • proof : Proof of Theorem 2.1
  • proof : Proof of Corollary 2.1