Table of Contents
Fetching ...

Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification

Xuanze Chen, Jiajun Zhou, Shanqing Yu, Qi Xuan

TL;DR

The paper tackles the challenge of effective node classification on graphs with both homophily and heterophily, where traditional GNNs struggle with over-smoothing and Graph Transformers risk noisy attention and poor scalability. It introduces GNNMoE, a universal architecture that decouples message passing, employs a mixture-of-experts with soft gating for per-node expert selection, and enhances expressiveness with an adaptive residual and a hard-gated FFN selecting among multiple activation-style experts. Empirical results across 12 diverse datasets show GNNMoE achieves superior or robust performance while remaining scalable on large graphs, and ablations reveal critical contributions from the FFN and residual components. The approach offers a flexible, scalable framework that leverages strengths of both GNNs and GTs for versatile node classification in varied graph environments.

Abstract

Graph neural networks excel at graph representation learning but struggle with heterophilous data and long-range dependencies. And graph transformers address these issues through self-attention, yet face scalability and noise challenges on large-scale graphs. To overcome these limitations, we propose GNNMoE, a universal model architecture for node classification. This architecture flexibly combines fine-grained message-passing operations with a mixture-of-experts mechanism to build feature encoding blocks. Furthermore, by incorporating soft and hard gating layers to assign the most suitable expert networks to each node, we enhance the model's expressive power and adaptability to different graph types. In addition, we introduce adaptive residual connections and an enhanced FFN module into GNNMoE, further improving the expressiveness of node representation. Extensive experimental results demonstrate that GNNMoE performs exceptionally well across various types of graph data, effectively alleviating the over-smoothing issue and global noise, enhancing model robustness and adaptability, while also ensuring computational efficiency on large-scale graphs.

Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification

TL;DR

The paper tackles the challenge of effective node classification on graphs with both homophily and heterophily, where traditional GNNs struggle with over-smoothing and Graph Transformers risk noisy attention and poor scalability. It introduces GNNMoE, a universal architecture that decouples message passing, employs a mixture-of-experts with soft gating for per-node expert selection, and enhances expressiveness with an adaptive residual and a hard-gated FFN selecting among multiple activation-style experts. Empirical results across 12 diverse datasets show GNNMoE achieves superior or robust performance while remaining scalable on large graphs, and ablations reveal critical contributions from the FFN and residual components. The approach offers a flexible, scalable framework that leverages strengths of both GNNs and GTs for versatile node classification in varied graph environments.

Abstract

Graph neural networks excel at graph representation learning but struggle with heterophilous data and long-range dependencies. And graph transformers address these issues through self-attention, yet face scalability and noise challenges on large-scale graphs. To overcome these limitations, we propose GNNMoE, a universal model architecture for node classification. This architecture flexibly combines fine-grained message-passing operations with a mixture-of-experts mechanism to build feature encoding blocks. Furthermore, by incorporating soft and hard gating layers to assign the most suitable expert networks to each node, we enhance the model's expressive power and adaptability to different graph types. In addition, we introduce adaptive residual connections and an enhanced FFN module into GNNMoE, further improving the expressiveness of node representation. Extensive experimental results demonstrate that GNNMoE performs exceptionally well across various types of graph data, effectively alleviating the over-smoothing issue and global noise, enhancing model robustness and adaptability, while also ensuring computational efficiency on large-scale graphs.

Paper Structure

This paper contains 12 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Illustration of GNNMoE architectures.
  • Figure 2: Impact of model depth.
  • Figure 3: Efficiency analysis on ogbn-arxiv dataset.