Table of Contents
Fetching ...

Optimal Time Complexity Algorithms for Computing General Random Walk Graph Kernels on Sparse Graphs

Krzysztof Choromanski, Isaac Reid, Arijit Sehanobish, Avinava Dubey

TL;DR

This work presents the first linear time complexity randomized algorithms for unbiased approximation of the celebrated family of general random walk kernels (RWKs) for sparse graphs, and shows that the ability to approximate general RWKs (rather than just special cases) unlocks efficient implicit graph kernel learning.

Abstract

We present the first linear time complexity randomized algorithms for unbiased approximation of the celebrated family of general random walk kernels (RWKs) for sparse graphs. This includes both labelled and unlabelled instances. The previous fastest methods for general RWKs were of cubic time complexity and not applicable to labelled graphs. Our method samples dependent random walks to compute novel graph embeddings in $\mathbb{R}^d$ whose dot product is equal to the true RWK in expectation. It does so without instantiating the direct product graph in memory, meaning we can scale to massive datasets that cannot be stored on a single machine. We derive exponential concentration bounds to prove that our estimator is sharp, and show that the ability to approximate general RWKs (rather than just special cases) unlocks efficient implicit graph kernel learning. Our method is up to $\mathbf{27\times}$ faster than its counterparts for efficient computation on large graphs and scales to graphs $\mathbf{128 \times}$ bigger than largest examples amenable to brute-force computation.

Optimal Time Complexity Algorithms for Computing General Random Walk Graph Kernels on Sparse Graphs

TL;DR

This work presents the first linear time complexity randomized algorithms for unbiased approximation of the celebrated family of general random walk kernels (RWKs) for sparse graphs, and shows that the ability to approximate general RWKs (rather than just special cases) unlocks efficient implicit graph kernel learning.

Abstract

We present the first linear time complexity randomized algorithms for unbiased approximation of the celebrated family of general random walk kernels (RWKs) for sparse graphs. This includes both labelled and unlabelled instances. The previous fastest methods for general RWKs were of cubic time complexity and not applicable to labelled graphs. Our method samples dependent random walks to compute novel graph embeddings in whose dot product is equal to the true RWK in expectation. It does so without instantiating the direct product graph in memory, meaning we can scale to massive datasets that cannot be stored on a single machine. We derive exponential concentration bounds to prove that our estimator is sharp, and show that the ability to approximate general RWKs (rather than just special cases) unlocks efficient implicit graph kernel learning. Our method is up to faster than its counterparts for efficient computation on large graphs and scales to graphs bigger than largest examples amenable to brute-force computation.

Paper Structure

This paper contains 28 sections, 3 theorems, 41 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

Supposing matrices $\mathbf{C}_{1,2}$, $\mathbf{D}_{1,2} \in \mathbb{R}^{N_{1,2} \times r_{1,2}}$ are sampled according to Alg. alg:main_alg, the estimator provides an unbiased estimate of the RWK, so that $\mathrm{K}_{\mathrm{RWK}}(\mathrm{G}_{1},\mathrm{G}_{2}) = \mathbb{E}(\widehat{\mathrm{K}}_{\mathrm{RWK}}(\mathrm{G}_{1},\mathrm{G}_{2}))$.

Figures (6)

  • Figure 1: Left: an example of the direct product of two graphs, a core concept of RWKs. Right: a schematic view of our approach. The graphs are embedded in $d_{\mathrm{G}}$-dimensional Euclidean space (here, $d_\mathrm{G}=3$), such that the dot product of embeddings equals RWKs in expectation. The embeddings are randomized; shadow arrows represent different possible realizations. Computing the embeddings means the direct product graph need not be instantiated in memory.
  • Figure 2: Schematic of the role of the random $g$-variables. Left: two graphs $\mathrm{G}_{1},\mathrm{G}_{2}$ with random walks. The numbers show the hop order. At each timestep $i$, the walkers leave a 'deposit' at the corresponding graph node, modulated by a draw of a random Rademacher variable $g(i)$. This variable is represented by a coloured square: white, green, dark blue, pink, grey, light blue then red. Lower right: total loads deposited at each supervertex in the direct product graph $\mathrm{G}_1 \times \mathrm{G}_2$, of which there are $4 \times 4 = 16$ in total. Since the Rademacher draws are independent at different timesteps, $\mathbb{E}(g(i_1)g(i_2)) = \mathbb{I}(i_1 = i_2)$ so we only get a contribution (in expectation) when the colours of the squares match. Averaging, we filter deposits in each supervertex for $i_1 = i_2$. Upper right: retaining only these non-vanishing contributions where $i_1=i_2$, we emulate the corresponding walk on the product graph, but without needing to instantiate it in memory explicitly.
  • Figure 3: GVoy average kernel approximation error on graphs from TUDataset Morris2020, plotted as a function of the number of sampled random walks $m$. The pair of tuples next to each dataset name corresponds to the number of vertices and edges of the respective graphs. Shaded regions represent standard errors over 10 runs.
  • Figure 4: Comparison of GVoys runtime with brute force baseline ('BruteForce') and previous efficient methods, with increasing numbers of vertices $N$. We generate datasets of $N_\mathrm{G}=10$ Erdős-Rényi graphs with $N$ vertices each and edge probability $p=0.1$. OOM and ORT mean 'out of memory' (32 GB) and 'out of runtime' (24 hours), respectively.
  • Figure 5: Comparing the approximation error of GVoys on samples of pairs of graphs from various datasets from TUDataset Morris2020 as a function of number of random walks. The pair of tuples next to each dataset name corresponds to the number of vertices and edges of the respective graphs. Shaded regions represent std-devs (over 10 runs).
  • ...and 1 more figures

Theorems & Definitions (3)

  • Theorem 4.1: GVoys are unbiased
  • Theorem 4.2: GVoys give sharp kernel estimates
  • Theorem 4.3: Rademacher is optimal