Table of Contents
Fetching ...

DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts

Zelin Yao, Chuang Liu, Xianke Meng, Yibing Zhan, Jia Wu, Shirui Pan, Wenbin Hu

TL;DR

The depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN backbone, shows that DA-MoE consistently surpasses existing baselines on various tasks, including graph, node, and link-level analyses.

Abstract

Graph neural networks (GNNs) are gaining popularity for processing graph-structured data. In real-world scenarios, graph data within the same dataset can vary significantly in scale. This variability leads to depth-sensitivity, where the optimal depth of GNN layers depends on the scale of the graph data. Empirically, fewer layers are sufficient for message passing in smaller graphs, while larger graphs typically require deeper networks to capture long-range dependencies and global features. However, existing methods generally use a fixed number of GNN layers to generate representations for all graphs, overlooking the depth-sensitivity issue in graph structure data. To address this challenge, we propose the depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN backbone: \textbf{1)} DA-MoE employs different GNN layers, each considered an expert with its own parameters. Such a design allows the model to flexibly aggregate information at different scales, effectively addressing the depth-sensitivity issue in graph data. \textbf{2)} DA-MoE utilizes GNN to capture the structural information instead of the linear projections in the gating network. Thus, the gating network enables the model to capture complex patterns and dependencies within the data. By leveraging these improvements, each expert in DA-MoE specifically learns distinct graph patterns at different scales. Furthermore, comprehensive experiments on the TU dataset and open graph benchmark (OGB) have shown that DA-MoE consistently surpasses existing baselines on various tasks, including graph, node, and link-level analyses. The code are available at \url{https://github.com/Celin-Yao/DA-MoE}.

DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts

TL;DR

The depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN backbone, shows that DA-MoE consistently surpasses existing baselines on various tasks, including graph, node, and link-level analyses.

Abstract

Graph neural networks (GNNs) are gaining popularity for processing graph-structured data. In real-world scenarios, graph data within the same dataset can vary significantly in scale. This variability leads to depth-sensitivity, where the optimal depth of GNN layers depends on the scale of the graph data. Empirically, fewer layers are sufficient for message passing in smaller graphs, while larger graphs typically require deeper networks to capture long-range dependencies and global features. However, existing methods generally use a fixed number of GNN layers to generate representations for all graphs, overlooking the depth-sensitivity issue in graph structure data. To address this challenge, we propose the depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN backbone: \textbf{1)} DA-MoE employs different GNN layers, each considered an expert with its own parameters. Such a design allows the model to flexibly aggregate information at different scales, effectively addressing the depth-sensitivity issue in graph data. \textbf{2)} DA-MoE utilizes GNN to capture the structural information instead of the linear projections in the gating network. Thus, the gating network enables the model to capture complex patterns and dependencies within the data. By leveraging these improvements, each expert in DA-MoE specifically learns distinct graph patterns at different scales. Furthermore, comprehensive experiments on the TU dataset and open graph benchmark (OGB) have shown that DA-MoE consistently surpasses existing baselines on various tasks, including graph, node, and link-level analyses. The code are available at \url{https://github.com/Celin-Yao/DA-MoE}.

Paper Structure

This paper contains 33 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The depth-sensitivity phenomenon in the IMDB-BINARY dataset, where graphs at different scales rely on specific GNN depths to capture information effectively.
  • Figure 2: Overview architecture of DA-MoE. (a) We utilize multiple experts instead of GNN backbone to learn specialized patterns from various aggregation scales. The gating network activates only a subset of these experts, with the grey arrows denoting the experts that were not activated. (b) Each expert is a GNN network with a distinct layer, where "Expert-1" refers to a 1-layer GNN. The shaded area in the figure represents the information aggregation range of the central node.
  • Figure 3: Visualization of expert scores in the gating network. We report the mean scores for each expert with a specific graph scale determined by number of nodes. The rows indicate the scale of the graph, while the columns, labeled from "Expert-1" to "Expert-4", represent progressively deeper GNN layers associated with each expert.
  • Figure 4: The ablation study on gating network.
  • Figure 5: The parameter analysis of scale factor $\lambda$.