Table of Contents
Fetching ...

A Zero-Inflated Poisson Latent Position Cluster Model

Chaoyi Lu, Riccardo Rastelli, Nial Friel

TL;DR

This work introduces the Zero-Inflated Poisson Latent Position Cluster Model (ZIP-LPCM), an extension of the latent position cluster model that handles non-negative weighted networks with missing or unusual zeros. By integrating ZIP data augmentation with a mixture-of-finite-mixtures prior for automatic cluster counting and a partially collapsed Bayesian inference framework, the authors enable robust clustering and 3D latent-space visualization. A novel Truncated Absorb-Eject move enhances exploration of the cluster space, and the method is demonstrated through simulations and four real networks, revealing nuanced structures and unusual-zero patterns. The approach provides scalable inference, interpretable latent representations, and a flexible framework that accommodates supervision via node attributes when available.

Abstract

The latent position network model (LPM) is a popular approach for the statistical analysis of network data. A central aspect of this model is that it assigns nodes to random positions in a latent space, such that the probability of an interaction between each pair of individuals or nodes is determined by their distance in this latent space. A key feature of this model is that it allows one to visualize nuanced structures via the latent space representation. The LPM can be further extended to the Latent Position Cluster Model (LPCM), to accommodate the clustering of nodes by assuming that the latent positions are distributed following a finite mixture distribution. In this paper, we extend the LPCM to accommodate missing network data and apply this to non-negative discrete weighted social networks. By treating missing data as ``unusual'' zero interactions, we propose a combination of the LPCM with the zero-inflated Poisson distribution. Statistical inference is based on a novel partially collapsed Markov chain Monte Carlo algorithm, where a Mixture-of-Finite-Mixtures (MFM) model is adopted to automatically determine the number of clusters and optimal group partitioning. Our algorithm features a truncated absorb-eject move, which is a novel adaptation of an idea commonly used in collapsed samplers, within the context of MFMs. Another aspect of our work is that we illustrate our results on 3-dimensional latent spaces, maintaining clear visualizations while achieving more flexibility than 2-dimensional models. The performance of this approach is illustrated via three carefully designed simulation studies, as well as four different publicly available real networks, where some interesting new perspectives are uncovered.

A Zero-Inflated Poisson Latent Position Cluster Model

TL;DR

This work introduces the Zero-Inflated Poisson Latent Position Cluster Model (ZIP-LPCM), an extension of the latent position cluster model that handles non-negative weighted networks with missing or unusual zeros. By integrating ZIP data augmentation with a mixture-of-finite-mixtures prior for automatic cluster counting and a partially collapsed Bayesian inference framework, the authors enable robust clustering and 3D latent-space visualization. A novel Truncated Absorb-Eject move enhances exploration of the cluster space, and the method is demonstrated through simulations and four real networks, revealing nuanced structures and unusual-zero patterns. The approach provides scalable inference, interpretable latent representations, and a flexible framework that accommodates supervision via node attributes when available.

Abstract

The latent position network model (LPM) is a popular approach for the statistical analysis of network data. A central aspect of this model is that it assigns nodes to random positions in a latent space, such that the probability of an interaction between each pair of individuals or nodes is determined by their distance in this latent space. A key feature of this model is that it allows one to visualize nuanced structures via the latent space representation. The LPM can be further extended to the Latent Position Cluster Model (LPCM), to accommodate the clustering of nodes by assuming that the latent positions are distributed following a finite mixture distribution. In this paper, we extend the LPCM to accommodate missing network data and apply this to non-negative discrete weighted social networks. By treating missing data as ``unusual'' zero interactions, we propose a combination of the LPCM with the zero-inflated Poisson distribution. Statistical inference is based on a novel partially collapsed Markov chain Monte Carlo algorithm, where a Mixture-of-Finite-Mixtures (MFM) model is adopted to automatically determine the number of clusters and optimal group partitioning. Our algorithm features a truncated absorb-eject move, which is a novel adaptation of an idea commonly used in collapsed samplers, within the context of MFMs. Another aspect of our work is that we illustrate our results on 3-dimensional latent spaces, maintaining clear visualizations while achieving more flexibility than 2-dimensional models. The performance of this approach is illustrated via three carefully designed simulation studies, as well as four different publicly available real networks, where some interesting new perspectives are uncovered.

Paper Structure

This paper contains 24 sections, 39 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Simulation study 1 synthetic networks. The 1st row plots correspond to the scenario 1 network, while the 2nd row plots correspond to the scenario 2 network. The 1st column plots show the 3-dimensional plots of the latent positions with different node colors denoting the corresponding true clustering. Node sizes are proportional to node betweeness, whereas edge widths and colors are proportional to edge weights. The 2nd column plots are the rotated plots of the 1st column latent position plots that are rotated for $90\degree$ clockwise with respect to the vertical axis.
  • Figure 2: Simulation study 1. Synthetic networks' adjacency matrix heatmap plots. Darker entries correspond to higher edge weights. The side-bars indicate the reference clustering $\boldsymbol{z}^*$. Left plot: scenario 1 network, generated from a ZIP-LPCM. Right plot: scenario 2 network, generated from a Pois-LPCM.
  • Figure 3: Simulation study 1 scenario 1. Performance of the posterior mean $\hat{\boldsymbol{\nu}}$, which approximates the conditional probability in Eq. \ref{['P_m0']}. The top-left plot is the heatmap of the reference values of Eq. \ref{['P_m0']}, obtained by leveraging the reference model parameters used for simulating the network, whereby darker entry colors correspond to higher values. The other four heatmap plots describe $\hat{\boldsymbol{\nu}}$ as inferred by the corresponding priors indicated on top of the heatmap. The rows and columns of the matrices are rearranged and separated according to $\hat{\boldsymbol{z}}$ while the side-bars indicate the true clustering of each individual. The last plot shows the Receiver Operating Characteristic (ROC) curves for all the supervised ZIP-LPCM cases, where the reference $\boldsymbol{\nu}^*$ is the response variable.
  • Figure 4: Simulation study 2. Synthetic networks' adjacency matrix heatmap plots. Darker entries correspond to higher edge weights. The side-bars indicate the reference clustering $\boldsymbol{z}^*$. Left plot: scenario 1 network, generated from a ZIP-SBM without hubs. Right plot: scenario 2 network, generated from a ZIP-SBM with hubs.
  • Figure 5: Simulation study 2. The 1st and the 2nd rows illustrate the inferred point estimate $\hat{\boldsymbol{U}}$ obtained by ZIP-LPCM Sup Beta(1,9) implementations for Scenario 1 and Scenario 2, respectively. The 2nd column plots are rotated version of the 1st column plots where each inferred latent position rotated for $90\degree$ clockwise with respect to the vertical axis. Different node colors correspond to different inferred groups according to the corresponding $\hat{\boldsymbol{z}}$. Node sizes are proportional to node betweenness while edge widths and colors are proportional to edge weights.
  • ...and 11 more figures