Table of Contents
Fetching ...

Training-Free Cross-Architecture Merging for Graph Neural Networks

Rishabh Bhattacharya, Vikaskumar Kalsariya, Naresh Manwani

TL;DR

H-GRAMA (Heterogeneous Graph Routing and Message Alignment), a training-free framework that lifts merging from parameter space to operator space, is introduced and Universal Message Passing Mixture (UMPM), a shared operator family that expresses heterogeneous GNN layers in a common functional language is formalized.

Abstract

Model merging has emerged as a powerful paradigm for combining the capabilities of distinct expert models without the high computational cost of retraining, yet current methods are fundamentally constrained to homogeneous architectures. For GNNs, however, message passing is topology-dependent and sensitive to misalignment, making direct parameter-space merging unreliable. To bridge this gap, we introduce H-GRAMA (Heterogeneous Graph Routing and Message Alignment), a training-free framework that lifts merging from parameter space to operator space. We formalize Universal Message Passing Mixture (UMPM), a shared operator family that expresses heterogeneous GNN layers in a common functional language. H-GRAMA enables cross-architecture GNN merging (e.g., GCN to GAT) without retraining, retaining high specialist accuracy in most cases in compatible depth settings and achieving inference speedups of 1.2x to 1.9x over ensembles.

Training-Free Cross-Architecture Merging for Graph Neural Networks

TL;DR

H-GRAMA (Heterogeneous Graph Routing and Message Alignment), a training-free framework that lifts merging from parameter space to operator space, is introduced and Universal Message Passing Mixture (UMPM), a shared operator family that expresses heterogeneous GNN layers in a common functional language is formalized.

Abstract

Model merging has emerged as a powerful paradigm for combining the capabilities of distinct expert models without the high computational cost of retraining, yet current methods are fundamentally constrained to homogeneous architectures. For GNNs, however, message passing is topology-dependent and sensitive to misalignment, making direct parameter-space merging unreliable. To bridge this gap, we introduce H-GRAMA (Heterogeneous Graph Routing and Message Alignment), a training-free framework that lifts merging from parameter space to operator space. We formalize Universal Message Passing Mixture (UMPM), a shared operator family that expresses heterogeneous GNN layers in a common functional language. H-GRAMA enables cross-architecture GNN merging (e.g., GCN to GAT) without retraining, retaining high specialist accuracy in most cases in compatible depth settings and achieving inference speedups of 1.2x to 1.9x over ensembles.
Paper Structure (57 sections, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 57 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: H-GRAMA pipeline. Heterogeneous GNN parents are canonicalized to a universal operator basis (UMPM), aligned via CKA and Procrustes transport, fused through closed-form gate regression with confidence-weighted mixing ($\alpha$), and stabilized via LFNorm moment calibration.
  • Figure 2: Retention analysis of the H-GRAMA merging process: Comparison between min-retention strategies for ensemble and parent models versus the full ratio ensemble performance.
  • Figure 3: Phase-wise ablation on Cora (GCN--GIN, 2--3 depth, 64--128 width). Each bar shows the incremental retention gain from adding one pipeline phase.
  • Figure 4: Visualization of absolute basis gate weights $|g_b^\ell|$ (per layer $\ell$) produced by H-GRAMA when merging heterogeneous parents on CiteSeer, using depth $3$--$3$ in all cases and the following width pairs: (a) GCN--GIN, $128$--$64$; (b) GCN--GraphSAGE, $64$--$64$; (c) GIN--GraphSAGE, $64$--$64$.
  • Figure 5: Loss landscape along the transport-aligned interpolation path on CiteSeer (GCN--GIN, 3--2 depth, 128--64 width). The label-free $\alpha_{\mathrm{auto}}$ (red) closely tracks the true minimum, while $\alpha=0.5$ lies far from the optimal basin.
  • ...and 1 more figures