Table of Contents
Fetching ...

Phylogenetic latent space models for network data

Federico Pavone, Daniele Durante, Robin J. Ryder

Abstract

Latent space models for network data characterize each node through a vector of latent features whose pairwise similarities define the edge probabilities among the pairs of nodes. Although this formulation has led to successful implementations, the overarching focus has been on directly inferring node embeddings through the latent features, rather than learning the generative process underlying these embeddings. This focus prevents borrowing information across the node features and limits the ability to infer higher-level architectures governing network formation. For example, routinely-studied networks often exhibit multiscale structures informing on nested modular hierarchies among nodes, which could be learned via tree-based representations of dependencies among the latent features. We pursue this direction by bridging latent variable representations of network data with concepts from phylogenetic inference to design a novel latent space model that explicitly characterizes the generative process of the node feature vectors through a branching Brownian motion, with branching structure parametrized by a tree. This tree constitutes the main object of interest and is learned under a Bayesian perspective leveraging priors inherited from phylogenetic literature to infer tree-based modular hierarchies across nodes, which explain heterogeneous multiscale patterns in the network. Identifiability results are derived along with posterior consistency theory. The inference potentials of our model are illustrated in simulations and two real-data applications from criminology and neuroscience, where our formulation learns core structures hidden to state-of-the-art alternatives.

Phylogenetic latent space models for network data

Abstract

Latent space models for network data characterize each node through a vector of latent features whose pairwise similarities define the edge probabilities among the pairs of nodes. Although this formulation has led to successful implementations, the overarching focus has been on directly inferring node embeddings through the latent features, rather than learning the generative process underlying these embeddings. This focus prevents borrowing information across the node features and limits the ability to infer higher-level architectures governing network formation. For example, routinely-studied networks often exhibit multiscale structures informing on nested modular hierarchies among nodes, which could be learned via tree-based representations of dependencies among the latent features. We pursue this direction by bridging latent variable representations of network data with concepts from phylogenetic inference to design a novel latent space model that explicitly characterizes the generative process of the node feature vectors through a branching Brownian motion, with branching structure parametrized by a tree. This tree constitutes the main object of interest and is learned under a Bayesian perspective leveraging priors inherited from phylogenetic literature to infer tree-based modular hierarchies across nodes, which explain heterogeneous multiscale patterns in the network. Identifiability results are derived along with posterior consistency theory. The inference potentials of our model are illustrated in simulations and two real-data applications from criminology and neuroscience, where our formulation learns core structures hidden to state-of-the-art alternatives.

Paper Structure

This paper contains 16 sections, 2 theorems, 33 equations, 8 figures, 1 table.

Key Result

Lemma S1.1

Consider prior eq:bbm_row_multi for the features ${\bf Z} ^{(m) \intercal}_{[k]}$, $k=1, \ldots, K$ and $m=1, \ldots, M$. Then, conditionally on $\sigma^2$ and $\Upsilon$, it holds independently over $m = 1,\dots, M$, where $(d^{(m)}_{vu})^2=([ {\bf D} ]_{vu}^{(m)})^2=\lVert {\bf z} ^{(m)}_{u} - {\bf z} ^{(m)}_{u} \rVert^2$ is the squared Euclidean distance between the $K$-dimensional feature v

Figures (8)

  • Figure 1: Graphical representation of the generative process underlying the proposed phylogenetic latent space model. From left to right: Tree defining the branching structure regulating the feature formation process, modeled via a branching Brownian motion (the leaves of the tree correspond to nodes of the network); Node-specific latent features obtained as a realization of the branching Brownian motion; Matrix of pairwise edge probabilities defined as the logit mapping of the negative Euclidean distance among the pairs of node-specific latent features; Adjacency matrix representation of the network with edges sampled from independent Bernoullis conditioned on the corresponding edge probability. Although phylnet allows for $K$-dimensional feature vectors, here we consider $K=2$ to facilitate visualization.
  • Figure 2: Radius of the $90\%$ credible sets centered at $\Upsilon_0$. This radius is computed under different tree distances and for varying settings of $M =1, 5, 10, 15, 40, 80$ and $V=20,40,80$.
  • Figure 3: First simulation scenario: community-type structure. From left to right: matrix of edge probabilities; four examples of simulated networks; inferred trees under the phylnet model and alternative competitors. Colors indicate community membership. For the phylnet model, the consensus tree is based on a 0.8 threshold proportion.
  • Figure 4: Second simulation scenario: tree-type multiscale structure. From left to right: True tree $\Upsilon_0$ and expectation of the $M$ edge-probability matrices; four examples of simulated networks; inferred trees under the proposed phylnet model and alternative competitors. For the phylnet model, the consensus tree is based on a 0.8 threshold proportion.
  • Figure 5: Example of one network in the criminology application. Left: graphical representation of the network, where the positions of the different criminals (nodes) are obtained via force directed placement fruchterman1991graph, whereas colors and shape denote locali membership and role, respectively. In 'Ndrangheta, locali correspond to subgroups in the criminal organization that administer crime in specific territories. Right: Adjacency matrix representation of the network. White and black colors for the entries of the matrix denote non-edges and edges respectively.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Lemma S1.1
  • proof : Proof of Lemma \ref{['lemma:d']}
  • Lemma S1.2
  • proof : Proof of Lemma \ref{['lemma:split_root']}
  • proof : Proof of Lemma \ref{['prop:constant']}
  • proof : Proof of Lemma \ref{['thm:ident_lat']}
  • proof : Proof of Theorem \ref{['thm:s2_g']}
  • proof : Proof of Proposition \ref{['prop:matrix_normal']}
  • proof : Proof of Theorem \ref{['thm:consist']}