Table of Contents
Fetching ...

Geometric Mixture-of-Experts with Curvature-Guided Adaptive Routing for Graph Representation Learning

Haifang Cao, Yu Wang, Timing Li, Xinjie Yao, Pengfei Zhu

Abstract

Graph-structured data typically exhibits complex topological heterogeneity, making it difficult to model accurately within a single Riemannian manifold. While emerging mixed-curvature methods attempt to capture such diversity, they often rely on implicit, task-driven routing that lacks fundamental geometric grounding. To address this challenge, we propose a Geometric Mixture-of-Experts framework (GeoMoE) that adaptively fuses node representations across diverse Riemannian spaces to better accommodate multi-scale topological structures. At its core, GeoMoE leverages Ollivier-Ricci Curvature (ORC) as an intrinsic geometric prior to orchestrate the collaboration of specialized experts. Specifically, we design a graph-aware gating network that assigns node-specific fusion weights, regularized by a curvature-guided alignment loss to ensure interpretable and geometry-consistent routing. Additionally, we introduce a curvature-aware contrastive objective that promotes geometric discriminability by constructing positive and negative pairs according to curvature consistency. Extensive experiments on six benchmark datasets demonstrate that GeoMoE outperforms state-of-the-art baselines across diverse graph types.

Geometric Mixture-of-Experts with Curvature-Guided Adaptive Routing for Graph Representation Learning

Abstract

Graph-structured data typically exhibits complex topological heterogeneity, making it difficult to model accurately within a single Riemannian manifold. While emerging mixed-curvature methods attempt to capture such diversity, they often rely on implicit, task-driven routing that lacks fundamental geometric grounding. To address this challenge, we propose a Geometric Mixture-of-Experts framework (GeoMoE) that adaptively fuses node representations across diverse Riemannian spaces to better accommodate multi-scale topological structures. At its core, GeoMoE leverages Ollivier-Ricci Curvature (ORC) as an intrinsic geometric prior to orchestrate the collaboration of specialized experts. Specifically, we design a graph-aware gating network that assigns node-specific fusion weights, regularized by a curvature-guided alignment loss to ensure interpretable and geometry-consistent routing. Additionally, we introduce a curvature-aware contrastive objective that promotes geometric discriminability by constructing positive and negative pairs according to curvature consistency. Extensive experiments on six benchmark datasets demonstrate that GeoMoE outperforms state-of-the-art baselines across diverse graph types.
Paper Structure (34 sections, 3 theorems, 69 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 34 sections, 3 theorems, 69 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\ell^m(v_i) = \ell(\mathbf{h}^m(v_i), y(v_i))$ be the loss of expert $m \in \{E, H, S\}$ for node $v_i$, and let $\mathbf{h}^{\text{fused}}(v_i)$ be the fused representation defined in eq: fused embedding with gating weights $\boldsymbol{\omega}_i$. Assume the loss function $\ell(\cdot, y)$ is Let $e^m$ be the corresponding one-hot vector for expert $m$. If the gating error satisfies $\mathb

Figures (5)

  • Figure 1: (Left) Topological visualization of real-world graphs (IV), combining grid (I), hierarchy (II), and cycle (III) structures. (Right) Performance of single-geometry models vs. our GeoMoE across these structures.
  • Figure 2: An overview of GeoMoE. We first encode the input graph via Euclidean, hyperbolic, and spherical geometric experts, then dynamically fuse their outputs through a curvature-guided gating alignment mechanism. The curvature-aware contrastive learning objective further refines the fused representations to enhance geometric discriminability.
  • Figure 3: Consistency between gating weights and ORC on Photo.
  • Figure 4: Sensitivity to curvature threshold $\theta$.
  • Figure 5: Impact of negative sample count $K$.

Theorems & Definitions (13)

  • Definition 3.1: Edge-level Ollivier-Ricci Curvature
  • Definition 3.2: Node-level Ollivier-Ricci Curvature
  • Theorem 4.1: Synergy Bound of Geometric Mixture-of-Experts
  • proof
  • Theorem 4.2: Monotonic Consistency Between Gating Weights and ORC
  • proof
  • Theorem 4.3: Geometry-Consistent Mutual Information Maximization
  • proof
  • Definition 1.1: First-Order Wasserstein Distance
  • Definition 1.2: Local Probability Measure
  • ...and 3 more