Table of Contents
Fetching ...

One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

Georgiy Shakirov, Albert Arakelov

TL;DR

This work tackles the limitation of type-dependent parameterization in heterogeneous graph transformers by introducing Homogeneous Expert Routing (HER), a shared Mixture-of-Experts (MoE) layer with stochastic masking of type embeddings to regularize routing. HER enables cross-type semantic transfer by learning functional roles that transcend node types, as opposed to type-separated MoEs, while maintaining type information as a soft cue during routing. Empirical results on IMDB, ACM, and DBLP demonstrate consistent link-prediction gains and reveal semantic specialization of experts that aligns with external labels such as movie genres. The approach offers a principled design principle for heterogeneous graph learning, combining shared parameterization with regularized type awareness to yield more generalizable and interpretable representations.

Abstract

A common practice in heterogeneous graph neural networks (HGNNs) is to condition parameters on node/edge types, assuming types reflect semantic roles. However, this can cause overreliance on surface-level labels and impede cross-type knowledge transfer. We explore integrating Mixture-of-Experts (MoE) into HGNNs--a direction underexplored despite MoE's success in homogeneous settings. Crucially, we question the need for type-specific experts. We propose Homogeneous Expert Routing (HER), an MoE layer for Heterogeneous Graph Transformers (HGT) that stochastically masks type embeddings during routing to encourage type-agnostic specialization. Evaluated on IMDB, ACM, and DBLP for link prediction, HER consistently outperforms standard HGT and a type-separated MoE baseline. Analysis on IMDB shows HER experts specialize by semantic patterns (e.g., movie genres) rather than node types, confirming routing is driven by latent semantics. Our work demonstrates that regularizing type dependence in expert routing yields more generalizable, efficient, and interpretable representations--a new design principle for heterogeneous graph learning.

One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

TL;DR

This work tackles the limitation of type-dependent parameterization in heterogeneous graph transformers by introducing Homogeneous Expert Routing (HER), a shared Mixture-of-Experts (MoE) layer with stochastic masking of type embeddings to regularize routing. HER enables cross-type semantic transfer by learning functional roles that transcend node types, as opposed to type-separated MoEs, while maintaining type information as a soft cue during routing. Empirical results on IMDB, ACM, and DBLP demonstrate consistent link-prediction gains and reveal semantic specialization of experts that aligns with external labels such as movie genres. The approach offers a principled design principle for heterogeneous graph learning, combining shared parameterization with regularized type awareness to yield more generalizable and interpretable representations.

Abstract

A common practice in heterogeneous graph neural networks (HGNNs) is to condition parameters on node/edge types, assuming types reflect semantic roles. However, this can cause overreliance on surface-level labels and impede cross-type knowledge transfer. We explore integrating Mixture-of-Experts (MoE) into HGNNs--a direction underexplored despite MoE's success in homogeneous settings. Crucially, we question the need for type-specific experts. We propose Homogeneous Expert Routing (HER), an MoE layer for Heterogeneous Graph Transformers (HGT) that stochastically masks type embeddings during routing to encourage type-agnostic specialization. Evaluated on IMDB, ACM, and DBLP for link prediction, HER consistently outperforms standard HGT and a type-separated MoE baseline. Analysis on IMDB shows HER experts specialize by semantic patterns (e.g., movie genres) rather than node types, confirming routing is driven by latent semantics. Our work demonstrates that regularizing type dependence in expert routing yields more generalizable, efficient, and interpretable representations--a new design principle for heterogeneous graph learning.

Paper Structure

This paper contains 25 sections, 22 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Mixture-of-Experts in Deep Learning. The router receives a fused representation $\tilde{\mathbf{x}}_v$, where the type embedding component is stochastically masked during training. A shared set of experts processes inputs from all node types, and the final output is a sparse weighted sum. This shared expert pool enables cross-type semantic transfer--semantically similar entities, regardless of type, can be routed to the same experts.
  • Figure 2: Expert activation heatmap for SharedMoE ($p_{\text{mask}} = 0.3$) on IMDB. Rows correspond to node samples (grouped by type: actor, director, movie); columns represent expert indices (0–15). Color intensity reflects the routing weight assigned to each expert. Experts 9–10 and 12–15 consistently dominate across all node types, indicating that routing is driven by shared semantic cues rather than type identity.
  • Figure 3: Expert activation heatmap for SeparatedMoE on IMDB. Each node type uses its own disjoint expert set. Activations are diffuse and uniformly distributed within each type, with no dominant expert emerging—suggesting fragmented specialization and lack of cross-type alignment.
  • Figure 4: t-SNE visualization of node embeddings on IMDB, colored by top-1 expert and shaped by node type. Clusters correspond to expert specializations: e.g., Expert 9 (blue) is enriched with Comedy films, Expert 12 (purple) with Drama. The co-location of movies, actors, and directors within the same expert cluster confirms that routing is driven by semantics, not type.

Theorems & Definitions (3)

  • Definition 1: SeparatedMoE
  • Definition 2: SharedMoE
  • Definition 3: Homogeneous Expert Routing