Table of Contents
Fetching ...

Mixture of Message Passing Experts with Routing Entropy Regularization for Node Classification

Xuanze Chen, Jiajun Zhou, Yadong Li, Jinsong Chen, Shanqing Yu, Qi Xuan

TL;DR

GNNMoE is proposed, a novel entropy-driven mixture of message-passing experts framework that enables node-level adaptive representation learning and consistently outperforms SOTA node classification methods, while maintaining scalability and interpretability.

Abstract

Graph neural networks (GNNs) have achieved significant progress in graph-based learning tasks, yet their performance often deteriorates when facing heterophilous structures where connected nodes differ substantially in features and labels. To address this limitation, we propose GNNMoE, a novel entropy-driven mixture of message-passing experts framework that enables node-level adaptive representation learning. GNNMoE decomposes message passing into propagation and transformation operations and integrates them through multiple expert networks guided by a hybrid routing mechanism. And a routing entropy regularization dynamically adjusts soft weighting and soft top-$k$ routing, allowing GNNMoE to flexibly adapt to diverse neighborhood contexts. Extensive experiments on twelve benchmark datasets demonstrate that GNNMoE consistently outperforms SOTA node classification methods, while maintaining scalability and interpretability. This work provides a unified and principled approach for achieving fine-grained, personalized node representation learning.

Mixture of Message Passing Experts with Routing Entropy Regularization for Node Classification

TL;DR

GNNMoE is proposed, a novel entropy-driven mixture of message-passing experts framework that enables node-level adaptive representation learning and consistently outperforms SOTA node classification methods, while maintaining scalability and interpretability.

Abstract

Graph neural networks (GNNs) have achieved significant progress in graph-based learning tasks, yet their performance often deteriorates when facing heterophilous structures where connected nodes differ substantially in features and labels. To address this limitation, we propose GNNMoE, a novel entropy-driven mixture of message-passing experts framework that enables node-level adaptive representation learning. GNNMoE decomposes message passing into propagation and transformation operations and integrates them through multiple expert networks guided by a hybrid routing mechanism. And a routing entropy regularization dynamically adjusts soft weighting and soft top- routing, allowing GNNMoE to flexibly adapt to diverse neighborhood contexts. Extensive experiments on twelve benchmark datasets demonstrate that GNNMoE consistently outperforms SOTA node classification methods, while maintaining scalability and interpretability. This work provides a unified and principled approach for achieving fine-grained, personalized node representation learning.

Paper Structure

This paper contains 42 sections, 4 theorems, 47 equations, 10 figures, 10 tables.

Key Result

Theorem 1

Suppose there are $m\ge 2$ message-passing experts and the routing weight distribution over experts is $\boldsymbol{\pi}=[\pi_1,\ldots,\pi_m]\in\Delta^{m}$, where feasible region $\Delta^{m}:=\{\boldsymbol{\pi}\in\mathbb{R}^m, \pi_g\ge 0, \sum_{g=1}^m \pi_g=1\}$. For a node $v_i$ at a given MoE-bloc where $\lambda>0$ is the entropy regularization coefficient. Then the optimal routing $\pi_g^{t+1}(

Figures (10)

  • Figure 1: Example of complex neighborhood context in graphs.
  • Figure 2: Observation experiment 1. Illustration the preferences of nodes with varying degrees of homophily toward different encoding schemes. Nodes are partitioned into subspaces according to their homophily levels and degrees. Distinct marker shapes highlight the encoding scheme achieving the best node classification performance within each subspace, while the marker size reflects the number of nodes in that subspace.
  • Figure 3: Illustration of GNNMoE architectures. The complete workflow proceeds as follows: 1) each node is processed by multiple message-passing experts, producing diverse candidate representations based on different encoding strategies; 2) a soft routing network computes routing scores conditioned on the node's features, aggregates the experts accordingly, and produces a preliminary multi-expert representation; 3) an entropy-driven routing adapter dynamically adjusts the routing process, striking a balance between fully weighted aggregation and approximate top-k expert activation; 4) the aggregated representation is refined by an enhanced FFN with hard routing that adaptively selects the activation function, jointly improving the expressiveness of final node representation.
  • Figure 4: Observation experiment 2. Preference for expert routing strategies across different graphs.
  • Figure 5: Visualization of routing weight distributions before and after introducing the routing entropy regularization mechanism (P: GCN-like P).
  • ...and 5 more figures

Theorems & Definitions (9)

  • Theorem 1: Temperature property of entropy-driven routing
  • proof
  • Definition 1: $\epsilon$-soft top-$k$
  • Corollary 1: $\epsilon$-soft top-$k$ Approximation
  • proof
  • Theorem 1: Temperature property of entropy-driven routing
  • proof
  • Corollary 1: $\epsilon$-soft top-$k$ Approximation
  • proof