GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs
Yue Jin, Yongchao Liu, Chuntao Hong
TL;DR
Industrial-scale graph learning suffers from storage and I/O bottlenecks when using offline precomputed subgraphs and from inefficiencies in online single-machine sampling. GraphGen+ addresses these issues by tightly integrating distributed subgraph generation with in-memory learning, employing Graph $\mathcal{G} = (\mathcal{V}, \mathcal{E})$, seed set $\mathcal{S}$, and worker set $\mathcal{W}$, along with a load-balanced mapping and a tree-based reduction to minimize communication overhead. It demonstrates substantial performance gains, achieving a 27$\times$ speedup over SQL-like methods and a 1.3$\times$ speedup over GraphGen, while supporting training on up to 1,000,000 nodes per iteration without external storage. This yields a scalable, production-ready solution for industry-scale graph learning, with practical deployment in graph intelligent computing systems such as liu2023graphtheta.
Abstract
Graph-based computations are crucial in a wide range of applications, where graphs can scale to trillions of edges. To enable efficient training on such large graphs, mini-batch subgraph sampling is commonly used, which allows training without loading the entire graph into memory. However, existing solutions face significant trade-offs: online subgraph generation, as seen in frameworks like DGL and PyG, is limited to a single machine, resulting in severe performance bottlenecks, while offline precomputed subgraphs, as in GraphGen, improve sampling efficiency but introduce large storage overhead and high I/O costs during training. To address these challenges, we propose \textbf{GraphGen+}, an integrated framework that synchronizes distributed subgraph generation with in-memory graph learning, eliminating the need for external storage while significantly improving efficiency. GraphGen+ achieves a \textbf{27$\times$} speedup in subgraph generation compared to conventional SQL-like methods and a \textbf{1.3$\times$} speedup over GraphGen, supporting training on 1 million nodes per iteration and removing the overhead associated with precomputed subgraphs, making it a scalable and practical solution for industry-scale graph learning.
