Table of Contents
Fetching ...

Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning

Jun Hu, Shangheng Chen, Yufei He, Yuan Li, Bryan Hooi, Bingsheng He

TL;DR

This work tackles the inefficiency of end-to-end training in heterogeneous graph neural networks and the echo problem inherent in label-based pre-computation. It introduces Echoless-LP, a memory-efficient framework built on Partition-Focused Echoless Propagation (PFEP), Asymmetric Partitioning Scheme (APS), and PostAdjust to eliminate training label leakage while remaining compatible with arbitrary message passing backbones. The approach enables multi-hop label aggregation without self-leakage and maintains scalable memory usage, outperforming or matching state-of-the-art baselines across small and large heterogeneous graph datasets. Empirical results demonstrate superior performance and robust memory efficiency, especially for high-hop propagation, validating the practicality of Echoless-LP for real-world large-scale HGNN deployment.

Abstract

Heterogeneous Graph Neural Networks (HGNNs) are widely used for deep learning on heterogeneous graphs. Typical end-to-end HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Pre-computation-based HGNNs address this by performing message passing only once during preprocessing, collecting neighbor information into regular-shaped tensors, which enables efficient mini-batch training. Label-based pre-computation methods collect neighbors' label information but suffer from training label leakage, where a node's own label information propagates back to itself during multi-hop message passing - the echo effect. Existing mitigation strategies are memory-inefficient on large graphs or suffer from compatibility issues with advanced message passing methods. We propose Echoless Label-based Pre-computation (Echoless-LP), which eliminates training label leakage with Partition-Focused Echoless Propagation (PFEP). PFEP partitions target nodes and performs echoless propagation, where nodes in each partition collect label information only from neighbors in other partitions, avoiding echo while remaining memory-efficient and compatible with any message passing method. We also introduce an Asymmetric Partitioning Scheme (APS) and a PostAdjust mechanism to address information loss from partitioning and distributional shifts across partitions. Experiments on public datasets demonstrate that Echoless-LP achieves superior performance and maintains memory efficiency compared to baselines.

Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning

TL;DR

This work tackles the inefficiency of end-to-end training in heterogeneous graph neural networks and the echo problem inherent in label-based pre-computation. It introduces Echoless-LP, a memory-efficient framework built on Partition-Focused Echoless Propagation (PFEP), Asymmetric Partitioning Scheme (APS), and PostAdjust to eliminate training label leakage while remaining compatible with arbitrary message passing backbones. The approach enables multi-hop label aggregation without self-leakage and maintains scalable memory usage, outperforming or matching state-of-the-art baselines across small and large heterogeneous graph datasets. Empirical results demonstrate superior performance and robust memory efficiency, especially for high-hop propagation, validating the practicality of Echoless-LP for real-world large-scale HGNN deployment.

Abstract

Heterogeneous Graph Neural Networks (HGNNs) are widely used for deep learning on heterogeneous graphs. Typical end-to-end HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Pre-computation-based HGNNs address this by performing message passing only once during preprocessing, collecting neighbor information into regular-shaped tensors, which enables efficient mini-batch training. Label-based pre-computation methods collect neighbors' label information but suffer from training label leakage, where a node's own label information propagates back to itself during multi-hop message passing - the echo effect. Existing mitigation strategies are memory-inefficient on large graphs or suffer from compatibility issues with advanced message passing methods. We propose Echoless Label-based Pre-computation (Echoless-LP), which eliminates training label leakage with Partition-Focused Echoless Propagation (PFEP). PFEP partitions target nodes and performs echoless propagation, where nodes in each partition collect label information only from neighbors in other partitions, avoiding echo while remaining memory-efficient and compatible with any message passing method. We also introduce an Asymmetric Partitioning Scheme (APS) and a PostAdjust mechanism to address information loss from partitioning and distributional shifts across partitions. Experiments on public datasets demonstrate that Echoless-LP achieves superior performance and maintains memory efficiency compared to baselines.

Paper Structure

This paper contains 21 sections, 7 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Feature/label-based pre-computation.
  • Figure 2: $v_1$ is also a multi-hop neighbor of itself, and the 2-hop MP and 3-hop MP propagate $v_1$’s own label back to itself (echo), causing training label leakage for $v_1$.
  • Figure 3: Memory usage (purple text) vs. number of hops $K$ for label‑based pre‑computation on OAG‑Venue (million‑scale). Although Echoless‑LP incurs a modest increase in pre‑computation time (y‑axis), it remains memory‑efficient for $K>2$, whereas RemoveDiag‑LP (SOTA) runs out of memory (OOM).
  • Figure 4: Overall Framework of Echoless Label-Based Pre-computation (Echoless-LP).
  • Figure 5: Partitioning Schemes.
  • ...and 3 more figures