General Graph Random Features

Isaac Reid; Krzysztof Choromanski; Eli Berger; Adrian Weller

General Graph Random Features

Isaac Reid, Krzysztof Choromanski, Eli Berger, Adrian Weller

TL;DR

This work addresses the cubic-time barrier of exact graph kernels by introducing general graph random features (g-GRFs) that unbiasedly estimate arbitrary functions of the weighted adjacency matrix via random walks. A key idea is a modulation function $f:(\,\mathbb{N}\cup\{0\}\to\mathbb{R})$, with unbiasedness achieved when $\boldsymbol{\alpha}=f_1*f_2$, i.e., $\alpha_k=\sum_{p=0}^k f_1(k-p) f_2(p)$, and $K_{\boldsymbol{\alpha}}(\mathbf{W})=K_{f_1}(\mathbf{W})K_{f_2}(\mathbf{W})$. The framework supports neural modulation functions $f^{(N)}$ to learn kernels and provides generalization bounds via empirical Rademacher complexity. Empirical results demonstrate unbiased estimation for standard graph kernels, efficient ODE on graphs, kernelised clustering, and implicit kernel learning for node attribute prediction, with learned modulation achieving lower MSE and scalability to graphs with thousands of nodes.

Abstract

We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix, coined universal graph random features (u-GRFs). This includes many of the most popular examples of kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic scaling of exact graph kernel evaluation. It can also be trivially distributed across machines, permitting learning on much larger networks. At the heart of the algorithm is a modulation function which upweights or downweights the contribution from different random walks depending on their lengths. We show that by parameterising it with a neural network we can obtain u-GRFs that give higher-quality kernel estimates or perform efficient, scalable kernel learning. We provide robust theoretical analysis and support our findings with experiments including pointwise estimation of fixed graph kernels, solving non-homogeneous graph ordinary differential equations, node clustering and kernel regression on triangular meshes.

General Graph Random Features

TL;DR

, with unbiasedness achieved when

, i.e.,

, and

. The framework supports neural modulation functions

to learn kernels and provides generalization bounds via empirical Rademacher complexity. Empirical results demonstrate unbiased estimation for standard graph kernels, efficient ODE on graphs, kernelised clustering, and implicit kernel learning for node attribute prediction, with learned modulation achieving lower MSE and scalability to graphs with thousands of nodes.

Abstract

Paper Structure (10 sections, 3 theorems, 16 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 10 sections, 3 theorems, 16 equations, 5 figures, 4 tables, 1 algorithm.

Introduction and related work
General graph random features
Generating functions:
Neural modulation functions, kernel learning and generalisation
Experiments
Unbiased pointwise estimation of fixed kernels
Solving differential equations on graphs
Efficient kernelised graph node clustering
Learning $f^{(N)}$ for better kernel approximation
Implicit kernel learning for node attribute prediction

Key Result

Theorem 2.1

For two modulation functions: $f_1,f_2: (\mathbb{N} \cup \{0\}) \rightarrow \mathbb{R}$, g-GRFs $\left(\phi_{f_1}(i))_{i=1}^N,(\phi_{f_2}(i)\right)_{i=1}^{N}$ constructed according to Alg. alg:constructing_rfs_for_k_alpha give unbiased approximation of $\mathbf{K}_{\boldsymbol{\alpha}}$, for kernels with an arbitrary Taylor expansion $\boldsymbol{\alpha}=(\alpha_k)_{k=0}^\infty$ provided that $\

Figures (5)

Figure 1: Schematic for a random walk on a graph (solid red) and an accompanying modulation function $f$ (dashed blue) used to approximate an arbitrary graph node function $K^\mathcal{G}$.
Figure 2: Unbiased estimation of popular kernels on different graphs using g-GRFs. The approximation error ($y$-axis) improves with the number of walkers ($x$-axis). We repeat $10$ times; one standard deviation of the mean error is shaded.
Figure 3: ODE simulation error decreases as the number of walkers grows.
Figure 4: Learned modulation function with different numbers of random walkers $m$. It approaches the unbiased $f^{(N)}$ as $m \to \infty$
Figure 5: Fixed and learned modulation functions for kernel regression

Theorems & Definitions (3)

Theorem 2.1: Unbiased approximation of $K_{\boldsymbol{\alpha}}$ via convolutions
Theorem 2.2: Computing symmetric modulation functions
Theorem 2.3: Empirical Rademacher complexity bound

General Graph Random Features

TL;DR

Abstract

General Graph Random Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)