LHGEL: Large Heterogeneous Graph Ensemble Learning using Batch View Aggregation
Jiajun Shen, Yufei Jin, Yi He, Xingquan Zhu
TL;DR
LHGEL tackles learning on large heterogeneous graphs by framing ensemble learning over subgraphs sampled under varying batch views. It introduces batch view aggregation, a two-stage residual attention mechanism, and a diversity regularizer to produce accurate yet diverse base learners, addressing both non-IID data and scalability. Theoretical analysis shows the residual design mitigates gradient vanishing, while experiments on five real-world datasets demonstrate superior accuracy and robustness against state-of-the-art baselines. The approach achieves scalable training with linear-time growth in practice and provides a practical, code-ready framework for heterogeneous graph analysis. Overall, LHGEL advances ensemble methods for heterogeneous graphs by integrating multi-view relational information with principled fusion and diversity constraints.
Abstract
Learning from large heterogeneous graphs presents significant challenges due to the scale of networks, heterogeneity in node and edge types, variations in nodal features, and complex local neighborhood structures. This paper advocates for ensemble learning as a natural solution to this problem, whereby training multiple graph learners under distinct sampling conditions, the ensemble inherently captures different aspects of graph heterogeneity. Yet, the crux lies in combining these learners to meet global optimization objective while maintaining computational efficiency on large-scale graphs. In response, we propose LHGEL, an ensemble framework that addresses these challenges through batch sampling with three key components, namely batch view aggregation, residual attention, and diversity regularization. Specifically, batch view aggregation samples subgraphs and forms multiple graph views, while residual attention adaptively weights the contributions of these views to guide node embeddings toward informative subgraphs, thereby improving the accuracy of base learners. Diversity regularization encourages representational disparity across embedding matrices derived from different views, promoting model diversity and ensemble robustness. Our theoretical study demonstrates that residual attention mitigates gradient vanishing issues commonly faced in ensemble learning. Empirical results on five real heterogeneous networks validate that our LHGEL approach consistently outperforms its state-of-the-art competitors by substantial margin. Codes and datasets are available at https://github.com/Chrisshen12/LHGEL.
