Fusion of Graph Neural Networks via Optimal Transport
Weronika Ormaniec, Michael Vollenweider, Elisa Hoskovec
TL;DR
This work addresses training-free fusion of graph neural networks by aligning layer-wise weights across models with Optimal Transport. The authors evaluate three transport-cost schemes—Euclidean Feature Distance (EFD), Quadratic Energy (QE), and Fused Gromov-Wasserstein (FGW)—using both EMD and Sinkhorn solvers, showing that OT fusion can beat vanilla averaging but may not reach ensemble performance. Key findings indicate that fusing GCNs is more challenging than fusing MLPs and that explicitly incorporating graph structure into the fusion cost does not improve results. The paper provides public code, discusses limitations (notably with FGW efficiency and dataset scope), and suggests directions for scaling to more datasets, more models, and potential one-shot skill transfer. The work advances training-free model fusion in graph domains and clarifies practical constraints for OT-based fusion methods.
Abstract
In this paper, we explore the idea of combining GCNs into one model. To that end, we align the weights of different models layer-wise using optimal transport (OT). We present and evaluate three types of transportation costs and show that the studied fusion method consistently outperforms the performance of vanilla averaging. Finally, we present results suggesting that model fusion using OT is harder in the case of GCNs than MLPs and that incorporating the graph structure into the process does not improve the performance of the method.
