Table of Contents
Fetching ...

Graph Neural Networks on Graph Databases

Dmytro Lopushanskyy, Borun Shi

TL;DR

This work shows how to directly train a GNN on a graph DB, by retrieving minimal data into memory and sampling using the query engine, which opens up a new way of scaling GNNs as well as a new application area for graph DBs.

Abstract

Training graph neural networks on large datasets has long been a challenge. Traditional approaches include efficiently representing the whole graph in-memory, designing parameter efficient and sampling-based models, and graph partitioning in a distributed setup. Separately, graph databases with native graph storage and query engines have been developed, which enable time and resource efficient graph analytics workloads. We show how to directly train a GNN on a graph DB, by retrieving minimal data into memory and sampling using the query engine. Our experiments show resource advantages for single-machine and distributed training. Our approach opens up a new way of scaling GNNs as well as a new application area for graph DBs.

Graph Neural Networks on Graph Databases

TL;DR

This work shows how to directly train a GNN on a graph DB, by retrieving minimal data into memory and sampling using the query engine, which opens up a new way of scaling GNNs as well as a new application area for graph DBs.

Abstract

Training graph neural networks on large datasets has long been a challenge. Traditional approaches include efficiently representing the whole graph in-memory, designing parameter efficient and sampling-based models, and graph partitioning in a distributed setup. Separately, graph databases with native graph storage and query engines have been developed, which enable time and resource efficient graph analytics workloads. We show how to directly train a GNN on a graph DB, by retrieving minimal data into memory and sampling using the query engine. Our experiments show resource advantages for single-machine and distributed training. Our approach opens up a new way of scaling GNNs as well as a new application area for graph DBs.

Paper Structure

This paper contains 25 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An example labelled property graph. Simple citation graphs such as coracora or ogbn-papers100Mhu2020open can be modelled by such schema.
  • Figure 2: Left: A Cypher query that returns all papers that are cited by at least one other paper. Right: An equivalent SQL query, assuming a reasonable table schema, such as one table PAPERS with properties columns and a second table CITES with columns citing_paper_id and cited_paper_id.
  • Figure 3: Cypher query that returns all relevant metadata for all nodes.
  • Figure 4: A Cypher query template that samples two-hop neighbourhoods of given seed nodes. Features of sampled nodes are returned in the same step.
  • Figure 5: Our distributed training architecture. One graph database acts as a central graph and feature store. Multiple training processes can concurrently sample from the DB with no overhead. Our architecture closely mimics that of PyTorch Distributed Data Paralleldist-PyTorch and applies to any other similar setup.
  • ...and 3 more figures