Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, Guy Wolf
TL;DR
The paper addresses the challenge of merging fully trained neural networks that originate from different initializations or data orders, where permutation-based alignment often fails due to distributed feature representations. It introduces CCA Merge, a method that uses Canonical Correlation Analysis to align linear combinations of neuronal activations across layers, enabling more accurate and scalable model fusion, including scenarios with many models. Empirical results show that CCA Merge consistently outperforms prior baselines across CIFAR, ImageNet, and disjoint data splits, while maintaining robustness as the number of models increases. The approach reduces the performance gap between merged models and ensembles at a lower computational and storage cost, with practical limitations noted around the need for input activations for alignment and associated compute overhead. Overall, CCA Merge provides a principled, flexible alternative to permutation-based merging, advancing the ability to extract and combine common representations learned by diverse networks.
Abstract
Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge
