One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

Georgiy Shakirov; Albert Arakelov

One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

Georgiy Shakirov, Albert Arakelov

TL;DR

This work tackles the limitation of type-dependent parameterization in heterogeneous graph transformers by introducing Homogeneous Expert Routing (HER), a shared Mixture-of-Experts (MoE) layer with stochastic masking of type embeddings to regularize routing. HER enables cross-type semantic transfer by learning functional roles that transcend node types, as opposed to type-separated MoEs, while maintaining type information as a soft cue during routing. Empirical results on IMDB, ACM, and DBLP demonstrate consistent link-prediction gains and reveal semantic specialization of experts that aligns with external labels such as movie genres. The approach offers a principled design principle for heterogeneous graph learning, combining shared parameterization with regularized type awareness to yield more generalizable and interpretable representations.

Abstract

A common practice in heterogeneous graph neural networks (HGNNs) is to condition parameters on node/edge types, assuming types reflect semantic roles. However, this can cause overreliance on surface-level labels and impede cross-type knowledge transfer. We explore integrating Mixture-of-Experts (MoE) into HGNNs--a direction underexplored despite MoE's success in homogeneous settings. Crucially, we question the need for type-specific experts. We propose Homogeneous Expert Routing (HER), an MoE layer for Heterogeneous Graph Transformers (HGT) that stochastically masks type embeddings during routing to encourage type-agnostic specialization. Evaluated on IMDB, ACM, and DBLP for link prediction, HER consistently outperforms standard HGT and a type-separated MoE baseline. Analysis on IMDB shows HER experts specialize by semantic patterns (e.g., movie genres) rather than node types, confirming routing is driven by latent semantics. Our work demonstrates that regularizing type dependence in expert routing yields more generalizable, efficient, and interpretable representations--a new design principle for heterogeneous graph learning.

One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

TL;DR

Abstract

One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)

Theorems & Definitions (3)