Table of Contents
Fetching ...

One-Hot Graph Encoder Embedding

Cencheng Shen, Qizhe Wang, Carey E. Priebe

TL;DR

A lightning fast graph embedding method called one-hot graph encoder embedding is proposed, which has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC — making it an ideal candidate for huge graph processing.

Abstract

In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding. It has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC -- making it an ideal candidate for huge graph processing. It is applicable to either adjacency matrix or graph Laplacian, and can be viewed as a transformation of the spectral embedding. Under random graph models, the graph encoder embedding is approximately normally distributed per vertex, and asymptotically converges to its mean. We showcase three applications: vertex classification, vertex clustering, and graph bootstrap. In every case, the graph encoder embedding exhibits unrivalled computational advantages.

One-Hot Graph Encoder Embedding

TL;DR

A lightning fast graph embedding method called one-hot graph encoder embedding is proposed, which has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC — making it an ideal candidate for huge graph processing.

Abstract

In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding. It has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC -- making it an ideal candidate for huge graph processing. It is applicable to either adjacency matrix or graph Laplacian, and can be viewed as a transformation of the spectral embedding. Under random graph models, the graph encoder embedding is approximately normally distributed per vertex, and asymptotically converges to its mean. We showcase three applications: vertex classification, vertex clustering, and graph bootstrap. In every case, the graph encoder embedding exhibits unrivalled computational advantages.

Paper Structure

This paper contains 17 sections, 4 theorems, 48 equations, 7 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

The graph encoder embedding is asymptotically normally distributed under SBM, DC-SBM, or RDPG. Specifically, as $n$ increases, for a given $i$th vertex of class $y$ it holds that The expectation and covariance are:

Figures (7)

  • Figure 1: We report the average running time of graph encoder embedding using $50$ Monte Carlo replicates, on a random graph with $K=10$, average degree $100$, and increasing graph size. The number of edges increases from one thousand to one billion. At $1$ billion edges with $10$ million vertices, the encoder embedding only requires $20$GB memory and finishes in $10$ minutes. All other methods exceed maximum memory capacity at $10$ million edges. More details on the methods compared can be found in Section \ref{['main2']}.
  • Figure 2: Visualizing the vertex embedding: the top row is the graph adjacency heatmap (the index are ordered based on class labels), the middle row is the graph encoder embedding, and the bottom row is the adjacency spectral embedding at $d=2$. Each graph is generated by SBM, DC-SBM, and RDPG from left column to right column at $n=2000$, with parameter details presented in the Appendix. In each panel, the red dots denote the vertex embedding of class $1$, and blue dots denote the vertex embedding of class $2$.
  • Figure 3: Visualizing the vertex embedding for the Political Blogs and Gene Network: the top row plots the graph connectivity via MATLAB graph plot function, and the bottom row is the graph encoder embedding. Red denotes class 1 vertices and blue denotes class 2 vertices.
  • Figure 4: Comparing the classification error (top row) and running time (bottom row in log scale) for SBM, DC-SBM, and RDPG graph with increasing $n$. Parameter details can be found in the Appendix.
  • Figure 5: The top row visualizes unsupervised AEE and ASE for an SBM graph, while the bottom row compares AEE and ASE for a RDPG graph. Those graphs are generated by the same three-class SBM and RDPG in Figure \ref{['fig0']} at $n=10000$. Blue, red, and green dots denote vertices of different classes. Note that the embedding dimension is $3$ while we visualized the embedding of the first two dimensions.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Corollary 1
  • Theorem 1
  • Corollary 1
  • proof
  • proof