One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers
Georgiy Shakirov, Albert Arakelov
TL;DR
This work tackles the limitation of type-dependent parameterization in heterogeneous graph transformers by introducing Homogeneous Expert Routing (HER), a shared Mixture-of-Experts (MoE) layer with stochastic masking of type embeddings to regularize routing. HER enables cross-type semantic transfer by learning functional roles that transcend node types, as opposed to type-separated MoEs, while maintaining type information as a soft cue during routing. Empirical results on IMDB, ACM, and DBLP demonstrate consistent link-prediction gains and reveal semantic specialization of experts that aligns with external labels such as movie genres. The approach offers a principled design principle for heterogeneous graph learning, combining shared parameterization with regularized type awareness to yield more generalizable and interpretable representations.
Abstract
A common practice in heterogeneous graph neural networks (HGNNs) is to condition parameters on node/edge types, assuming types reflect semantic roles. However, this can cause overreliance on surface-level labels and impede cross-type knowledge transfer. We explore integrating Mixture-of-Experts (MoE) into HGNNs--a direction underexplored despite MoE's success in homogeneous settings. Crucially, we question the need for type-specific experts. We propose Homogeneous Expert Routing (HER), an MoE layer for Heterogeneous Graph Transformers (HGT) that stochastically masks type embeddings during routing to encourage type-agnostic specialization. Evaluated on IMDB, ACM, and DBLP for link prediction, HER consistently outperforms standard HGT and a type-separated MoE baseline. Analysis on IMDB shows HER experts specialize by semantic patterns (e.g., movie genres) rather than node types, confirming routing is driven by latent semantics. Our work demonstrates that regularizing type dependence in expert routing yields more generalizable, efficient, and interpretable representations--a new design principle for heterogeneous graph learning.
