Table of Contents
Fetching ...

Community Detection Guarantees Using Embeddings Learned by Node2Vec

Andrew Davison, S. Carlyle Morgan, Owen G. Ward

TL;DR

This work shows that the use of $k$-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models.

Abstract

Embedding the nodes of a large network into an Euclidean space is a common objective in modern machine learning, with a variety of tools available. These embeddings can then be used as features for tasks such as community detection/node clustering or link prediction, where they achieve state of the art performance. With the exception of spectral clustering methods, there is little theoretical understanding for commonly used approaches to learning embeddings. In this work we examine the theoretical properties of the embeddings learned by node2vec. Our main result shows that the use of $k$-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data.

Community Detection Guarantees Using Embeddings Learned by Node2Vec

TL;DR

This work shows that the use of -means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models.

Abstract

Embedding the nodes of a large network into an Euclidean space is a common objective in modern machine learning, with a variety of tools available. These embeddings can then be used as features for tasks such as community detection/node clustering or link prediction, where they achieve state of the art performance. With the exception of spectral clustering methods, there is little theoretical understanding for commonly used approaches to learning embeddings. In this work we examine the theoretical properties of the embeddings learned by node2vec. Our main result shows that the use of -means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data.
Paper Structure (42 sections, 39 theorems, 204 equations, 12 figures)

This paper contains 42 sections, 39 theorems, 204 equations, 12 figures.

Key Result

Theorem 1

(Informal) Suppose we observe a sequence of graphs $\mathcal{G}_n$ on $n$ vertices arising from a two-dimensional stochastic block model: for each vertex $u \in [n]$ we assign a community label $c(u) \in \{0, 1\}$ with equal probability, and then we form edges in the graph independently with probabi where $\tilde{p} \neq \tilde{q}$. Suppose that $(\widehat{\omega}_u)$ are two-dimensional embedding

Figures (12)

  • Figure 1: Proportion of nodes correctly recovered for both the regular and degree corrected relatively sparse SBM.
  • Figure 2: Proportion of nodes correctly recovered as we vary the negative sampling parameter in node2vec with mean and one standard error for each setting. We see similar performance for each choice of $\alpha$.
  • Figure 3: Node2vec with k-means clustering can recover the communities in the political blog data while spectral clustering fails.
  • Figure S1: Classification accuracy using 10% of the node embeddings to learn a multinomial logistic regression classifier. Mean and one standard error shown.
  • Figure S2: NMI for relatively sparse SBM. Mean and one standard error shown.
  • ...and 7 more figures

Theorems & Definitions (69)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 5
  • Theorem S1
  • Theorem S2
  • Theorem S3
  • proof
  • Proposition S4
  • ...and 59 more