Table of Contents
Fetching ...

LHGEL: Large Heterogeneous Graph Ensemble Learning using Batch View Aggregation

Jiajun Shen, Yufei Jin, Yi He, Xingquan Zhu

TL;DR

LHGEL tackles learning on large heterogeneous graphs by framing ensemble learning over subgraphs sampled under varying batch views. It introduces batch view aggregation, a two-stage residual attention mechanism, and a diversity regularizer to produce accurate yet diverse base learners, addressing both non-IID data and scalability. Theoretical analysis shows the residual design mitigates gradient vanishing, while experiments on five real-world datasets demonstrate superior accuracy and robustness against state-of-the-art baselines. The approach achieves scalable training with linear-time growth in practice and provides a practical, code-ready framework for heterogeneous graph analysis. Overall, LHGEL advances ensemble methods for heterogeneous graphs by integrating multi-view relational information with principled fusion and diversity constraints.

Abstract

Learning from large heterogeneous graphs presents significant challenges due to the scale of networks, heterogeneity in node and edge types, variations in nodal features, and complex local neighborhood structures. This paper advocates for ensemble learning as a natural solution to this problem, whereby training multiple graph learners under distinct sampling conditions, the ensemble inherently captures different aspects of graph heterogeneity. Yet, the crux lies in combining these learners to meet global optimization objective while maintaining computational efficiency on large-scale graphs. In response, we propose LHGEL, an ensemble framework that addresses these challenges through batch sampling with three key components, namely batch view aggregation, residual attention, and diversity regularization. Specifically, batch view aggregation samples subgraphs and forms multiple graph views, while residual attention adaptively weights the contributions of these views to guide node embeddings toward informative subgraphs, thereby improving the accuracy of base learners. Diversity regularization encourages representational disparity across embedding matrices derived from different views, promoting model diversity and ensemble robustness. Our theoretical study demonstrates that residual attention mitigates gradient vanishing issues commonly faced in ensemble learning. Empirical results on five real heterogeneous networks validate that our LHGEL approach consistently outperforms its state-of-the-art competitors by substantial margin. Codes and datasets are available at https://github.com/Chrisshen12/LHGEL.

LHGEL: Large Heterogeneous Graph Ensemble Learning using Batch View Aggregation

TL;DR

LHGEL tackles learning on large heterogeneous graphs by framing ensemble learning over subgraphs sampled under varying batch views. It introduces batch view aggregation, a two-stage residual attention mechanism, and a diversity regularizer to produce accurate yet diverse base learners, addressing both non-IID data and scalability. Theoretical analysis shows the residual design mitigates gradient vanishing, while experiments on five real-world datasets demonstrate superior accuracy and robustness against state-of-the-art baselines. The approach achieves scalable training with linear-time growth in practice and provides a practical, code-ready framework for heterogeneous graph analysis. Overall, LHGEL advances ensemble methods for heterogeneous graphs by integrating multi-view relational information with principled fusion and diversity constraints.

Abstract

Learning from large heterogeneous graphs presents significant challenges due to the scale of networks, heterogeneity in node and edge types, variations in nodal features, and complex local neighborhood structures. This paper advocates for ensemble learning as a natural solution to this problem, whereby training multiple graph learners under distinct sampling conditions, the ensemble inherently captures different aspects of graph heterogeneity. Yet, the crux lies in combining these learners to meet global optimization objective while maintaining computational efficiency on large-scale graphs. In response, we propose LHGEL, an ensemble framework that addresses these challenges through batch sampling with three key components, namely batch view aggregation, residual attention, and diversity regularization. Specifically, batch view aggregation samples subgraphs and forms multiple graph views, while residual attention adaptively weights the contributions of these views to guide node embeddings toward informative subgraphs, thereby improving the accuracy of base learners. Diversity regularization encourages representational disparity across embedding matrices derived from different views, promoting model diversity and ensemble robustness. Our theoretical study demonstrates that residual attention mitigates gradient vanishing issues commonly faced in ensemble learning. Empirical results on five real heterogeneous networks validate that our LHGEL approach consistently outperforms its state-of-the-art competitors by substantial margin. Codes and datasets are available at https://github.com/Chrisshen12/LHGEL.

Paper Structure

This paper contains 24 sections, 1 theorem, 19 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Assuming for a common loss function $\mathcal{L}$ and message passing framework, $\frac{\partial \mathcal{L}}{\partial h_{f}}$ and $\frac{\partial h_{i}}{\partial w_{r}}$ are bounded, further assuming that node representation $h$, learnable parameters $W'$,$W_{i}$ are bounded, then $\frac{\partial \

Figures (7)

  • Figure 1: A conceptual view of heterogeneous information aggregation via Relational Aggregation for a target node $a_1$ (i.e. node colored in red circle). From left to right, ① is the large heterogeneous graph; ②: a relation group $\mathcal{M}_i$ consists of node relations derived from metapaths; ③ and ④: for each $\mathcal{M}_i$, we aggregate information across its relations by propagating messages along the associated relations, enabling the model to capture rich semantic dependencies as formulated in Eq. (4); and ⑤ final representation for the target node (i.e.$a_1$), aggregated from all relations.
  • Figure 2: A conceptual overview of heterogeneous information aggregation enabled by the Batch View Aggregation mechanism. From left to right, ① is the large heterogeneous graph; ②: we extract multiple relation groups from the large heterogeneous graph; ③: for each relation group $\mathcal{M}_{i}$, we randomly sample a small batch of target nodes and expand their neighborhoods; ④: for each batch, we obtain node embeddings through relational aggregation as illustrated in Fig. \ref{['fig:relationAgg']}; and ⑤ final representation of target nodes at each batch size
  • Figure 3: The proposed LHGEL framework enables ensemble learning on large heterogeneous graphs. From left to right, ① is the large heterogeneous graph. ②: for each relation group, node embeddings are computed via batch aggregation as illustrated in Fig.\ref{['fig:relation']}. ③: these embeddings are then combined within each relation group using a residual attention mechanism (Fig.\ref{['fig:attention-detail']}), followed by a second attention-based fusion ④ across all relation groups to produce the final node representation. ⑤: this representation is passed through a multi-layer perceptron (MLP) for prediction. ⑥: an $L_1$ norm on the correlation matrix obtained from batch aggregation embeddings intends to encourage diversity between embeddings. ⑦: The objective function jointly optimizes the batch aggregation modules, residual attention modules, and the MLP layer to achieve effective ensemble learning.
  • Figure 4: Residual attention computation inside each relation group. Embeddings from different batch sizes in each relation group $\mathcal{M}_{i}$ are first projected into a shared attention space through projection weight $W_{i}^b$. The projected embeddings are then concatenated and passed through a shared projection weight $W'_{i}$ to learn $m$ raw $\Theta$ attention scores. Using minmax normalization (refer to Eq. 10), the final fusion residual attention $\tilde{\Theta}_{\mathcal{M}_{i}}^{\mathcal{B}}$ is obtained. For Residual attention computation across relation groups, $W_{i}^b$ is replaced with $W_{i}$ and $m$ is replaced with $c$. The final residual attention $\tilde{\Theta}_{\mathcal{Q}}$ is obtained from Eq. 10 similarly.
  • Figure 5: Impact of the number of relation groups on the ensemble learning results. Green violin plots show mean and variance for LHGEL with only single relation group, whereas orange violin plots show LHGEL's mean and variance with multiple relation groups.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • Remark 1