Simultaneous linear connectivity of neural networks modulo permutation
Ekansh Sharma, Devin Kwok, Tom Denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite
TL;DR
This work addresses permutation symmetries in neural networks that create non-convex loss landscapes by distinguishing three notions of linear mode connectivity modulo permutation. It refines prior results to show that existing evidence mainly supports weak LC, introduces simultaneous weak LC along SGD trajectories and IMP sequences, and presents initial evidence that strong LC may emerge as networks become very wide. The approach combines weight matching and activation matching to align networks, investigates how a single permutation can connect multiple related networks, and demonstrates that permuted masks from IMP can be transported across independently trained models. The findings highlight the potential for permutation-aware representations to render the loss landscape effectively convex in permutation-adjusted spaces under certain conditions, with practical implications for federated learning and model fusion, while also identifying algorithmic limitations that currently cap strong LC to very wide networks.
Abstract
Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between trained networks if they are permuted appropriately. In this work, we refine these arguments into three distinct claims of increasing strength. We show that existing evidence only supports "weak linear connectivity"-that for each pair of networks belonging to a set of SGD solutions, there exist (multiple) permutations that linearly connect it with the other networks. In contrast, the claim "strong linear connectivity"-that for each network, there exists one permutation that simultaneously connects it with the other networks-is both intuitively and practically more desirable. This stronger claim would imply that the loss landscape is convex after accounting for permutation, and enable linear interpolation between three or more independently trained models without increased loss. In this work, we introduce an intermediate claim-that for certain sequences of networks, there exists one permutation that simultaneously aligns matching pairs of networks from these sequences. Specifically, we discover that a single permutation aligns sequences of iteratively trained as well as iteratively pruned networks, meaning that two networks exhibit low loss barriers at each step of their optimization and sparsification trajectories respectively. Finally, we provide the first evidence that strong linear connectivity may be possible under certain conditions, by showing that barriers decrease with increasing network width when interpolating among three networks.
