Efficient Heterogeneous Graph Learning via Random Projection
Jun Hu, Bryan Hooi, Bingsheng He
TL;DR
Heterogeneous Graph Neural Networks often suffer from inefficiency due to repetitive message passing on large graphs. RpHGNN introduces a hybrid pre-computation framework combining representation-wise efficiency with relation-wise low information loss, using propagate-then-update iterations, Random Projection Squashing, and an Even-Odd Propagation Scheme to manage information flow. It achieves state-of-the-art results across seven datasets and is substantially faster (about 230% faster than the best baseline) than strong pre-computation baselines, while also outperforming end-to-end methods on several tasks. This work demonstrates that scalable, accurate HGNNs are attainable by carefully balancing information preservation and computational efficiency, with potential extensions to memory optimization and broader tasks like link prediction.
Abstract
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Typical HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors, enabling efficient mini-batch training. Existing pre-computation-based HGNNs can be mainly categorized into two styles, which differ in how much information loss is allowed and efficiency. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods.
