Table of Contents
Fetching ...

PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities

Daniel Zilberg, Ron Levie

TL;DR

PieClam tackles universal graph representation by embedding nodes into a latent space of inclusive and exclusive communities and learning a prior over that space. It introduces IeClam with a Lorentz inner product decoder, enabling a generative model that can represent diverse graph topologies, including bipartite structures, and proves universality via the log cut distance. The prior-based extension (PieClam) and its node-feature variant broaden the applicability to attributed graphs, with sampling enabled by normalizing flows. Theoretical guarantees are complemented by experiments in synthetic priors, SBMs, and anomaly detection, where PieClam and IeClam achieve competitive results and demonstrate the practical value of universal graph autoencoders in understanding graph structure and anomalies.

Abstract

We propose PieClam (Prior Inclusive Exclusive Cluster Affiliation Model): a probabilistic graph model for representing any graph as overlapping generalized communities. Our method can be interpreted as a graph autoencoder: nodes are embedded into a code space by an algorithm that maximizes the log-likelihood of the decoded graph, given the input graph. PieClam is a community affiliation model that extends well-known methods like BigClam in two main manners. First, instead of the decoder being defined via pairwise interactions between the nodes in the code space, we also incorporate a learned prior on the distribution of nodes in the code space, turning our method into a graph generative model. Secondly, we generalize the notion of communities by allowing not only sets of nodes with strong connectivity, which we call inclusive communities, but also sets of nodes with strong disconnection, which we call exclusive communities. To model both types of communities, we propose a new type of decoder based the Lorentz inner product, which we prove to be much more expressive than standard decoders based on standard inner products or norm distances. By introducing a new graph similarity measure, that we call the log cut distance, we show that PieClam is a universal autoencoder, able to uniformly approximately reconstruct any graph. Our method is shown to obtain competitive performance in graph anomaly detection benchmarks.

PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities

TL;DR

PieClam tackles universal graph representation by embedding nodes into a latent space of inclusive and exclusive communities and learning a prior over that space. It introduces IeClam with a Lorentz inner product decoder, enabling a generative model that can represent diverse graph topologies, including bipartite structures, and proves universality via the log cut distance. The prior-based extension (PieClam) and its node-feature variant broaden the applicability to attributed graphs, with sampling enabled by normalizing flows. Theoretical guarantees are complemented by experiments in synthetic priors, SBMs, and anomaly detection, where PieClam and IeClam achieve competitive results and demonstrate the practical value of universal graph autoencoders in understanding graph structure and anomalies.

Abstract

We propose PieClam (Prior Inclusive Exclusive Cluster Affiliation Model): a probabilistic graph model for representing any graph as overlapping generalized communities. Our method can be interpreted as a graph autoencoder: nodes are embedded into a code space by an algorithm that maximizes the log-likelihood of the decoded graph, given the input graph. PieClam is a community affiliation model that extends well-known methods like BigClam in two main manners. First, instead of the decoder being defined via pairwise interactions between the nodes in the code space, we also incorporate a learned prior on the distribution of nodes in the code space, turning our method into a graph generative model. Secondly, we generalize the notion of communities by allowing not only sets of nodes with strong connectivity, which we call inclusive communities, but also sets of nodes with strong disconnection, which we call exclusive communities. To model both types of communities, we propose a new type of decoder based the Lorentz inner product, which we prove to be much more expressive than standard decoders based on standard inner products or norm distances. By introducing a new graph similarity measure, that we call the log cut distance, we show that PieClam is a universal autoencoder, able to uniformly approximately reconstruct any graph. Our method is shown to obtain competitive performance in graph anomaly detection benchmarks.
Paper Structure (51 sections, 4 theorems, 84 equations, 9 figures, 3 tables)

This paper contains 51 sections, 4 theorems, 84 equations, 9 figures, 3 tables.

Key Result

Theorem 10

For every epsilon $\epsilon>0$, every $N\in\mathbb{N}$, and every adjacency matrix $\mathbf{A}\in[0,1]^{N\times N}$, there are $N$ affiliation features $\mathbf{F}\in \mathbb{R}^{2K}$ of dimension $K=-9\log(\epsilon/2)^2/\epsilon^2$ such that the corresponding IeClam model $\mathbf{P}=\{P(n\sim m|\m Here, the log cut distance is from Definition def:KL2. As a result, IeClam and PieClam are universa

Figures (9)

  • Figure 1: Left to right: Synthetic prior in an affiliation space of two inclusive communities. Reconstructed prior by PClam with normalizing flow. Reconstructed affiliation features by PClam.
  • Figure 2: Left to right: Adjacency matrix sampled from SBM with three classes and 9 blocks. Adjacency matrix of the fitted PieClam graph, with two inclusive and two exclusive communities. Affiliation features of the PieClam matrix in $\mathcal{T}$ projected to $(t^1,s^1)$ and $(t^2,s^2)$.
  • Figure 3: Left to right: Adjacency matrix sampled from SBM. Fitted PClam adjacency matrix based on two inclusive communities. Affiliations features of the PCLam matrix.
  • Figure 4: Left to right: Target SBM. Fitted BigClam graph with two communities. Error as a function of optimization iteration where error is, left to right, log cut distance, cut distance, l2 distance. After convergence, the log cut distance between the SBM and BigClam is 0.0776.
  • Figure 5: Left to right: Target SBM. Fitted BigClam graph with four communities. Error as a function of optimization iteration where error is, left to right, log cut distance, cut distance, l2 distance. After convergence, the log cut distance between the SBM and BigClam is 0.0775.
  • ...and 4 more figures

Theorems & Definitions (20)

  • Definition 1
  • Definition 2
  • Remark 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Claim 9
  • Theorem 10
  • ...and 10 more