Table of Contents
Fetching ...

Self-Adaptive Graph Mixture of Models

Mohit Meena, Yash Punjabi, Abhishek A, Vishal Sharma, Mahesh Chandran

TL;DR

SAGMM addresses the challenge of choosing the best GNN for a given graph by learning to automatically select and combine a diverse pool of experts through a topology-aware gating mechanism. It introduces TAAG, which leverages both local multi-hop features and global structural encodings to route nodes to the most relevant experts, and employs adaptive pruning to keep computation in check. The framework supports a pre-trained-experts variant (SAGMM-PE) to boost data efficiency and reuse existing models, and it provides a theoretical bound showing the trade-off between expert count and efficiency. Empirically, SAGMM delivers consistent gains across node, graph, and link tasks on large, diverse benchmarks, while maintaining competitive inference and memory profiles. This work offers a practical, scalable path to robust graph learning by harnessing architectural diversity and topology-aware routing.

Abstract

Graph Neural Networks (GNNs) have emerged as powerful tools for learning over graph-structured data, yet recent studies have shown that their performance gains are beginning to plateau. In many cases, well-established models such as GCN and GAT, when appropriately tuned, can match or even exceed the performance of more complex, state-of-the-art architectures. This trend highlights a key limitation in the current landscape: the difficulty of selecting the most suitable model for a given graph task or dataset. To address this, we propose Self-Adaptive Graph Mixture of Models (SAGMM), a modular and practical framework that learns to automatically select and combine the most appropriate GNN models from a diverse pool of architectures. Unlike prior mixture-of-experts approaches that rely on variations of a single base model, SAGMM leverages architectural diversity and a topology-aware attention gating mechanism to adaptively assign experts to each node based on the structure of the input graph. To improve efficiency, SAGMM includes a pruning mechanism that reduces the number of active experts during training and inference without compromising performance. We also explore a training-efficient variant in which expert models are pretrained and frozen, and only the gating and task-specific layers are trained. We evaluate SAGMM on 16 benchmark datasets covering node classification, graph classification, regression, and link prediction tasks, and demonstrate that it consistently outperforms or matches leading GNN baselines and prior mixture-based methods, offering a robust and adaptive solution for real-world graph learning.

Self-Adaptive Graph Mixture of Models

TL;DR

SAGMM addresses the challenge of choosing the best GNN for a given graph by learning to automatically select and combine a diverse pool of experts through a topology-aware gating mechanism. It introduces TAAG, which leverages both local multi-hop features and global structural encodings to route nodes to the most relevant experts, and employs adaptive pruning to keep computation in check. The framework supports a pre-trained-experts variant (SAGMM-PE) to boost data efficiency and reuse existing models, and it provides a theoretical bound showing the trade-off between expert count and efficiency. Empirically, SAGMM delivers consistent gains across node, graph, and link tasks on large, diverse benchmarks, while maintaining competitive inference and memory profiles. This work offers a practical, scalable path to robust graph learning by harnessing architectural diversity and topology-aware routing.

Abstract

Graph Neural Networks (GNNs) have emerged as powerful tools for learning over graph-structured data, yet recent studies have shown that their performance gains are beginning to plateau. In many cases, well-established models such as GCN and GAT, when appropriately tuned, can match or even exceed the performance of more complex, state-of-the-art architectures. This trend highlights a key limitation in the current landscape: the difficulty of selecting the most suitable model for a given graph task or dataset. To address this, we propose Self-Adaptive Graph Mixture of Models (SAGMM), a modular and practical framework that learns to automatically select and combine the most appropriate GNN models from a diverse pool of architectures. Unlike prior mixture-of-experts approaches that rely on variations of a single base model, SAGMM leverages architectural diversity and a topology-aware attention gating mechanism to adaptively assign experts to each node based on the structure of the input graph. To improve efficiency, SAGMM includes a pruning mechanism that reduces the number of active experts during training and inference without compromising performance. We also explore a training-efficient variant in which expert models are pretrained and frozen, and only the gating and task-specific layers are trained. We evaluate SAGMM on 16 benchmark datasets covering node classification, graph classification, regression, and link prediction tasks, and demonstrate that it consistently outperforms or matches leading GNN baselines and prior mixture-based methods, offering a robust and adaptive solution for real-world graph learning.

Paper Structure

This paper contains 34 sections, 2 theorems, 30 equations, 5 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

Under the design conditions above, the following inequality holds: where $U = -\log(\epsilon_0)$ and $f(k_u, \epsilon_0) = \log(2k_u^2) - \log(2k_u(1 + \epsilon_0) - 1).$

Figures (5)

  • Figure 1: The overall illustration of the SAGMM framework. The key components includes the gating network, a pool of experts, and an adaptive expert pruning. $W_i$ denotes $W_{Q_{i}}$, $W_{K_{i}}$, and $W_{V_{i}}$ per expert $i$.
  • Figure 2: (a) Performance variation across different Top-$k$ values in GMoE-GCN for various datasets. (b) Distribution of expert activation counts by SAGMM for ogbn-proteins dataset, which contains 132,534 nodes.
  • Figure 3: (a) Average number of experts activated per node $u$ ($k_u$) and experts remaining post-pruning ($N$). (b) Visualization of the Pr $\left\{ \mathcal{L}\left(\frac{\eta(\mathcal{Y}_u)}{k_u}\right) \leq a \right\}$ (Theorem 1) for $\epsilon_0 = 0.001, a = 0.3$.
  • Figure 4: Paramaters reduction with expert pruning on ogbn-arxiv dataset. M denotes Millions.
  • Figure 5: SAGMM-PE architecture illustrating the modular design with pretrained GNN experts, a graph-aware gating mechanism, and a downstream task-specific head.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof