Table of Contents
Fetching ...

Simplifying Subgraph Representation Learning for Scalable Link Prediction

Paul Louis, Shweta Ann Jacob, Amirali Salehi-Abari

TL;DR

This work proposes a new class of subgraph representation learning approaches, called Scalable Simplified SGRL (S3GRL), aimed at faster training and inference, which simplifies the message passing and aggregation operations in each link's subgraph.

Abstract

Link prediction on graphs is a fundamental problem. Subgraph representation learning approaches (SGRLs), by transforming link prediction to graph classification on the subgraphs around the links, have achieved state-of-the-art performance in link prediction. However, SGRLs are computationally expensive, and not scalable to large-scale graphs due to expensive subgraph-level operations. To unlock the scalability of SGRLs, we propose a new class of SGRLs, that we call Scalable Simplified SGRL (S3GRL). Aimed at faster training and inference, S3GRL simplifies the message passing and aggregation operations in each link's subgraph. S3GRL, as a scalability framework, accommodates various subgraph sampling strategies and diffusion operators to emulate computationally-expensive SGRLs. We propose multiple instances of S3GRL and empirically study them on small to large-scale graphs. Our extensive experiments demonstrate that the proposed S3GRL models scale up SGRLs without significant performance compromise (even with considerable gains in some cases), while offering substantially lower computational footprints (e.g., multi-fold inference and training speedup).

Simplifying Subgraph Representation Learning for Scalable Link Prediction

TL;DR

This work proposes a new class of subgraph representation learning approaches, called Scalable Simplified SGRL (S3GRL), aimed at faster training and inference, which simplifies the message passing and aggregation operations in each link's subgraph.

Abstract

Link prediction on graphs is a fundamental problem. Subgraph representation learning approaches (SGRLs), by transforming link prediction to graph classification on the subgraphs around the links, have achieved state-of-the-art performance in link prediction. However, SGRLs are computationally expensive, and not scalable to large-scale graphs due to expensive subgraph-level operations. To unlock the scalability of SGRLs, we propose a new class of SGRLs, that we call Scalable Simplified SGRL (S3GRL). Aimed at faster training and inference, S3GRL simplifies the message passing and aggregation operations in each link's subgraph. S3GRL, as a scalability framework, accommodates various subgraph sampling strategies and diffusion operators to emulate computationally-expensive SGRLs. We propose multiple instances of S3GRL and empirically study them on small to large-scale graphs. Our extensive experiments demonstrate that the proposed S3GRL models scale up SGRLs without significant performance compromise (even with considerable gains in some cases), while offering substantially lower computational footprints (e.g., multi-fold inference and training speedup).
Paper Structure (11 sections, 10 equations, 1 figure, 8 tables)

This paper contains 11 sections, 10 equations, 1 figure, 8 tables.

Figures (1)

  • Figure 1: Our S3GRL framework: In the preprocessing phase (shown by the shaded blue arrow), first multiple subgraphs are extracted around the target nodes $u$ and $v$ (shaded in blue) by various sampling strategies. Diffusion matrices are then created from extracted subgraph adjacency matrices by predefined diffusion operators (e.g., powers of subgraphs in this figure). Each diffusion process involves the application of the subgraph diffusion matrix on its nodal features to create the matrix $\mathbf{Z}^{(i)}_{uv}$. The operator-level node representations of selected nodes (with a red border in raw data) are then aggregated for all subgraphs to form the joint $\mathbf{Z}_{uv}$ matrix. The selected nodes in this example are the target nodes $\{u,v\}$, and their common neighbor $d$. In the learning phase (as shown by the shaded red arrow), the joint matrix $\mathbf{Z}_{uv}$ undergoes dimensionality reduction followed by pooling using center pooling (highlighted by blue-border box) and common neighbor pooling (highlighted by purple-border box). Finally, the target representation $\textbf{q}_{uv}$ is transformed by an MLP to a link probability $P_{uv}$.