Training-Free Cross-Architecture Merging for Graph Neural Networks

Rishabh Bhattacharya; Vikaskumar Kalsariya; Naresh Manwani

Training-Free Cross-Architecture Merging for Graph Neural Networks

Rishabh Bhattacharya, Vikaskumar Kalsariya, Naresh Manwani

TL;DR

H-GRAMA (Heterogeneous Graph Routing and Message Alignment), a training-free framework that lifts merging from parameter space to operator space, is introduced and Universal Message Passing Mixture (UMPM), a shared operator family that expresses heterogeneous GNN layers in a common functional language is formalized.

Abstract

Model merging has emerged as a powerful paradigm for combining the capabilities of distinct expert models without the high computational cost of retraining, yet current methods are fundamentally constrained to homogeneous architectures. For GNNs, however, message passing is topology-dependent and sensitive to misalignment, making direct parameter-space merging unreliable. To bridge this gap, we introduce H-GRAMA (Heterogeneous Graph Routing and Message Alignment), a training-free framework that lifts merging from parameter space to operator space. We formalize Universal Message Passing Mixture (UMPM), a shared operator family that expresses heterogeneous GNN layers in a common functional language. H-GRAMA enables cross-architecture GNN merging (e.g., GCN to GAT) without retraining, retaining high specialist accuracy in most cases in compatible depth settings and achieving inference speedups of 1.2x to 1.9x over ensembles.

Training-Free Cross-Architecture Merging for Graph Neural Networks

TL;DR

Abstract

Paper Structure (57 sections, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 57 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Our approach: merging in operator space.
Contributions
Related Work
Training-free model merging and alignment-aware methods.
Merging under task and architectural heterogeneity.
Graph-centric model reuse and GNN merging.
Methodology
Universal Message Passing Mixture (UMPM)
Basis operators.
UMPM layer specification.
Layer alignment via representation similarity
Monotone alignment.
Coordinate transport via orthogonal Procrustes alignment
Procrustes alignment.
...and 42 more sections

Figures (6)

Figure 1: H-GRAMA pipeline. Heterogeneous GNN parents are canonicalized to a universal operator basis (UMPM), aligned via CKA and Procrustes transport, fused through closed-form gate regression with confidence-weighted mixing ($\alpha$), and stabilized via LFNorm moment calibration.
Figure 2: Retention analysis of the H-GRAMA merging process: Comparison between min-retention strategies for ensemble and parent models versus the full ratio ensemble performance.
Figure 3: Phase-wise ablation on Cora (GCN--GIN, 2--3 depth, 64--128 width). Each bar shows the incremental retention gain from adding one pipeline phase.
Figure 4: Visualization of absolute basis gate weights $|g_b^\ell|$ (per layer $\ell$) produced by H-GRAMA when merging heterogeneous parents on CiteSeer, using depth $3$--$3$ in all cases and the following width pairs: (a) GCN--GIN, $128$--$64$; (b) GCN--GraphSAGE, $64$--$64$; (c) GIN--GraphSAGE, $64$--$64$.
Figure 5: Loss landscape along the transport-aligned interpolation path on CiteSeer (GCN--GIN, 3--2 depth, 128--64 width). The label-free $\alpha_{\mathrm{auto}}$ (red) closely tracks the true minimum, while $\alpha=0.5$ lies far from the optimal basin.
...and 1 more figures

Training-Free Cross-Architecture Merging for Graph Neural Networks

TL;DR

Abstract

Training-Free Cross-Architecture Merging for Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)