Table of Contents
Fetching ...

NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training

Zhongwei Xie, Jiaxin Bai, Shujie Liu, Haoyu Huang, Yufei Li, Yisen Gao, Hong Ting Tsang, Yangqiu Song

TL;DR

Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representation friction in hybrid neuro-symbolic reasoning.

Abstract

Neural Graph Databases (NGDBs) facilitate complex logical reasoning over incomplete knowledge structures, yet their training efficiency and expressivity are constrained by rigid query-level batching and structure-exclusive embeddings. We present NGDB-Zoo, a unified framework that resolves these bottlenecks by synergizing operator-level training with semantic augmentation. By decoupling logical operators from query topologies, NGDB-Zoo transforms the training loop into a dynamically scheduled data-flow execution, enabling multi-stream parallelism and achieving a $1.8\times$ - $6.8\times$ throughput compared to baselines. Furthermore, we formalize a decoupled architecture to integrate high-dimensional semantic priors from Pre-trained Text Encoders (PTEs) without triggering I/O stalls or memory overflows. Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representation friction in hybrid neuro-symbolic reasoning.

NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training

TL;DR

Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representation friction in hybrid neuro-symbolic reasoning.

Abstract

Neural Graph Databases (NGDBs) facilitate complex logical reasoning over incomplete knowledge structures, yet their training efficiency and expressivity are constrained by rigid query-level batching and structure-exclusive embeddings. We present NGDB-Zoo, a unified framework that resolves these bottlenecks by synergizing operator-level training with semantic augmentation. By decoupling logical operators from query topologies, NGDB-Zoo transforms the training loop into a dynamically scheduled data-flow execution, enabling multi-stream parallelism and achieving a - throughput compared to baselines. Furthermore, we formalize a decoupled architecture to integrate high-dimensional semantic priors from Pre-trained Text Encoders (PTEs) without triggering I/O stalls or memory overflows. Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representation friction in hybrid neuro-symbolic reasoning.
Paper Structure (58 sections, 15 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 58 sections, 15 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Task Illustration. Query embedding methods aim to answer multihop logical queries (A) by avoiding explicit knowledge graph traversal and executing the query directly in the embedding space by following the query computation plan (B). Operators are executed in the embedding space (C).
  • Figure 2: Evolution toward High-Throughput Asynchronous Pipelining. Training methods. (a) represents the naive training scheme; (b) shows the pre-sampling and pre-fetching optimization; (c) NGDB-Zoo's operator-level approach breaks these constraints to enable fully overlapping stages, creating a dense execution stream that maximizes hardware saturation.
  • Figure 3: Overcoming Topological Rigidity.(Left): Query-level batching incurs high fragmentation overheads by segregating distinct query structures. (Right): NGDB-Zoo aggregates atomic operators into homogeneous Operator Pools for unified execution, eliminating structural constraints and saturating GPU cores.
  • Figure 4: Max-Fillness Dynamic Scheduling. By prioritizing operator pools with the highest workload, the scheduler maximizes GPU utils. This dynamic selection effectively unblocks dependent nodes, ensuring a continuous, high-throughput training stream.
  • Figure 5: Vectorized execution for Intersection/Union. By grouping operations into cardinality-based Equivalence Classes, we eliminate tensor misalignment. This enables perfectly aligned Vectorized Execution, replacing slow loops with dense matrix operations.
  • ...and 4 more figures