Community Detection Guarantees Using Embeddings Learned by Node2Vec

Andrew Davison; S. Carlyle Morgan; Owen G. Ward

Community Detection Guarantees Using Embeddings Learned by Node2Vec

Andrew Davison, S. Carlyle Morgan, Owen G. Ward

TL;DR

This work shows that the use of $k$-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models.

Abstract

Embedding the nodes of a large network into an Euclidean space is a common objective in modern machine learning, with a variety of tools available. These embeddings can then be used as features for tasks such as community detection/node clustering or link prediction, where they achieve state of the art performance. With the exception of spectral clustering methods, there is little theoretical understanding for commonly used approaches to learning embeddings. In this work we examine the theoretical properties of the embeddings learned by node2vec. Our main result shows that the use of $k$-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data.

Community Detection Guarantees Using Embeddings Learned by Node2Vec

TL;DR

This work shows that the use of

-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models.

Abstract

-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data.

Paper Structure (42 sections, 39 theorems, 204 equations, 12 figures)

This paper contains 42 sections, 39 theorems, 204 equations, 12 figures.

Introduction
Summary of main results
Related Works
Framework
Probabilistic models for community detection
Obtaining embeddings from node2vec
Using embeddings for community detection
Results
Asymptotic distribution of the embeddings
Guarantees for community detection
Experiments
Conclusion and Future Work
Additional Experimental Results
Additional Simulation, Node Classification
Additional Results, Community Detection
...and 27 more sections

Key Result

Theorem 1

(Informal) Suppose we observe a sequence of graphs $\mathcal{G}_n$ on $n$ vertices arising from a two-dimensional stochastic block model: for each vertex $u \in [n]$ we assign a community label $c(u) \in \{0, 1\}$ with equal probability, and then we form edges in the graph independently with probabi where $\tilde{p} \neq \tilde{q}$. Suppose that $(\widehat{\omega}_u)$ are two-dimensional embedding

Figures (12)

Figure 1: Proportion of nodes correctly recovered for both the regular and degree corrected relatively sparse SBM.
Figure 2: Proportion of nodes correctly recovered as we vary the negative sampling parameter in node2vec with mean and one standard error for each setting. We see similar performance for each choice of $\alpha$.
Figure 3: Node2vec with k-means clustering can recover the communities in the political blog data while spectral clustering fails.
Figure S1: Classification accuracy using 10% of the node embeddings to learn a multinomial logistic regression classifier. Mean and one standard error shown.
Figure S2: NMI for relatively sparse SBM. Mean and one standard error shown.
...and 7 more figures

Theorems & Definitions (69)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Corollary 5
Theorem S1
Theorem S2
Theorem S3
proof
Proposition S4
...and 59 more

Community Detection Guarantees Using Embeddings Learned by Node2Vec

TL;DR

Abstract

Community Detection Guarantees Using Embeddings Learned by Node2Vec

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (69)