Table of Contents
Fetching ...

Leveraging Non-linear Dimension Reduction and Random Walk Co-occurrence for Node Embedding

Ryan DeWolfe

TL;DR

COVE, an explainable high dimensional embedding that, when reduced to low dimension with UMAP, slightly increases performance on clustering and link prediction tasks, is proposed.

Abstract

Leveraging non-linear dimension reduction techniques, we remove the low dimension constraint from node embedding and propose COVE, an explainable high dimensional embedding that, when reduced to low dimension with UMAP, slightly increases performance on clustering and link prediction tasks. The embedding is inspired by neural embedding methods that use co-occurrence on a random walk as an indication of similarity, and is closely related to a diffusion process. Extending on recent community detection benchmarks, we find that a COVE UMAP HDBSCAN pipeline performs similarly to the popular Louvain algorithm.

Leveraging Non-linear Dimension Reduction and Random Walk Co-occurrence for Node Embedding

TL;DR

COVE, an explainable high dimensional embedding that, when reduced to low dimension with UMAP, slightly increases performance on clustering and link prediction tasks, is proposed.

Abstract

Leveraging non-linear dimension reduction techniques, we remove the low dimension constraint from node embedding and propose COVE, an explainable high dimensional embedding that, when reduced to low dimension with UMAP, slightly increases performance on clustering and link prediction tasks. The embedding is inspired by neural embedding methods that use co-occurrence on a random walk as an indication of similarity, and is closely related to a diffusion process. Extending on recent community detection benchmarks, we find that a COVE UMAP HDBSCAN pipeline performs similarly to the popular Louvain algorithm.
Paper Structure (13 sections, 14 equations, 6 figures, 2 tables)

This paper contains 13 sections, 14 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Example of embeddings produced by node2vec, node2vec+UMAP, and the proposed COVE+UMAPLE algorithm. The graph is the world-wide airport graph of airports and their connections via direct flights, with ground truth communities (colored) corresponding to the continents. Using node2vec directly to 2 dimensions does not separate any clusters, whereas using node2vec then UMAP or COVE then UMAPLE does (with similar visual quality).
  • Figure 2: Evaluating the quality of each embedding method on real networks using an unsupervised framework cge. Each bar represents the average of 10 independent embeddings, with the small bar covering the full range of values, for both the global (top) and local (bottom) divergences are reported. In both graphs, a lower score indicates better performance.
  • Figure 3: Comparing the performance of embedding algorithms for community detection on synthetic ABCD graphs. The embeddings are clustered in 2 (left), 16 (center), and 128 (right) dimensions using HDBSCAN (bottom) and K-Means (top). The COVE+UMAP and COVE+UMAPLE scores are extremely similar (and difficult to visually distinguish).
  • Figure 4: Comparing embedding methods for community detection on real graphs with known ground truth communities. The embeddings are clustered using HDBSCAN and optimized over the minimum community size parameter. Each bar represent the average score over 10 independent embeddings, with the smaller bar spanning the full range of scores. COVE+UMAP, COVE+UMAPLE, node2vev+UMAP, Louvain, and ECG perfom similarly and better than COVE+SVD or node2vec on most of the graphs.
  • Figure 5: Performance of a classifier for link prediction on real graphs using edge vectors created by the hadamard product of the end node embeddings. For 10 samples, $5\%$ of edges were removed at random to create a test set and added to an equal number of non-edges to form the test set. Each bar represents the average AUC score, with the smaller black bar spanning the full range of values.
  • ...and 1 more figures