Table of Contents
Fetching ...

Amortized Probabilistic Detection of Communities in Graphs

Yueqi Wang, Yoonho Lee, Pallab Basu, Juho Lee, Yee Whye Teh, Liam Paninski, Ari Pakman

TL;DR

A simple framework for amortized community detection is proposed, which addresses both the expressive power of GNNs with recent methods for amortized clustering by combining the expressive power of GNNs with recent methods for amortized clustering.

Abstract

Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a simple framework for amortized community detection, which addresses both of these issues by combining the expressive power of GNNs with recent methods for amortized clustering. Our models consist of a graph representation backbone that extracts structural information and an amortized clustering network that naturally handles variable numbers of clusters. Both components combine into well-defined models of the posterior distribution of graph communities and are jointly optimized given labeled graphs. At inference time, the models yield parallel samples from the posterior of community labels, quantifying uncertainty in a principled way. We evaluate several models from our framework on synthetic and real datasets, and demonstrate improved performance compared to previous methods. As a separate contribution, we extend recent amortized probabilistic clustering architectures by adding attention modules, which yield further improvements on community detection tasks.

Amortized Probabilistic Detection of Communities in Graphs

TL;DR

A simple framework for amortized community detection is proposed, which addresses both the expressive power of GNNs with recent methods for amortized clustering by combining the expressive power of GNNs with recent methods for amortized clustering.

Abstract

Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a simple framework for amortized community detection, which addresses both of these issues by combining the expressive power of GNNs with recent methods for amortized clustering. Our models consist of a graph representation backbone that extracts structural information and an amortized clustering network that naturally handles variable numbers of clusters. Both components combine into well-defined models of the posterior distribution of graph communities and are jointly optimized given labeled graphs. At inference time, the models yield parallel samples from the posterior of community labels, quantifying uncertainty in a principled way. We evaluate several models from our framework on synthetic and real datasets, and demonstrate improved performance compared to previous methods. As a separate contribution, we extend recent amortized probabilistic clustering architectures by adding attention modules, which yield further improvements on community detection tasks.

Paper Structure

This paper contains 36 sections, 33 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Amortized Community Detection.
  • Figure 2: CCP and CCP-Attn.Left: Architecture of the CCP model pakman2020 for clusterwise amortized clustering. Right: Our proposed modification, CCP-Attn., where the mean aggregations m used by CCP (see equation (\ref{['eq:ccp_U_G']})) are replaced by Set Transformer attention modules from lee2018set. See Appendix \ref{['app:ccp-attn-arch']} for details.
  • Figure 3: Node-wise sampling (NCP Model).1a-1b: Two nodes have been already assigned to the same community ($c_1=c_2=1$, red triangles) and a randomly selected new node is sampled from $p(c_3|c_{1:2}, {\bf x} )$ to choose whether it joins them (1a, $c_3=1$), or creates a new community (1b, $c_3=2$, green star). 2a-2c: The previous node started its own community, and the next random node is sampled from $p(c_4|c_{1:3}, {\bf x} )$ to choose whether it joins any of the existing communities (2a, $c_4=1$; 2b, $c_4=2$) or creates a new community (2c, $c_4=3$, blue square). The procedure is repeated until all nodes have are sampled.
  • Figure 4: Community-wise sampling (CCP and DAC Models). (1) The first element of community ${\bf s} _1$ (black triangle) is sampled uniformly, and the available points (grey dots) are queried to join. (2) The first community ${\bf s} _1$ is formed (red triangles). (3) The first element of ${\bf s} _2$ (black square) is sampled uniformly from unassigned points. (4) The second community ${\bf s} _2$ is formed (blue squares). (5)-(6) We repeat this procedure until no unassigned points are left. In CCP, the binary queries are correlated, but become independent conditioned on a latent vector, thus allowing parallel sampling (eq.(\ref{['eq:marginal_z']})).
  • Figure 5: General SBM.Left: Observations ($N=222$). Right: Exact community recovery by CCP-Attn.
  • ...and 5 more figures