Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

Keyao Zhan; Puheng Li; Lei Wu

Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

Keyao Zhan, Puheng Li, Lei Wu

TL;DR

This work analyzes permutation invariance in linear mode connectivity (LMC) for two-layer ReLU networks under a teacher–student setup. It proves that the LMC barrier after applying the optimal permutation decays as $O(m^{-1/2})$ independent of input dimension, and reveals a peak-barrier and a double-descent pattern as the student width $m$ increases, with a minimum barrier near $m=2M$. The study also uncovers a learning-rate–driven sparsity transition in GD/SGD solutions, which improves permutation matching and further reduces the barrier, with empirical support on synthetic data, MNIST, and deeper architectures. These results illuminate how width, sparsity, and permutation interact to shape loss landscapes and have implications for model merging and ensemble methods.

Abstract

It was empirically observed in Entezari et al. (2021) that when accounting for the permutation invariance of neural networks, there is likely no loss barrier along the linear interpolation between two SGD solutions -- a phenomenon known as linear mode connectivity (LMC) modulo permutation. This phenomenon has sparked significant attention due to both its theoretical interest and practical relevance in applications such as model merging. In this paper, we provide a fine-grained analysis of this phenomenon for two-layer ReLU networks under a teacher-student setup. We show that as the student network width $m$ increases, the LMC loss barrier modulo permutation exhibits a double descent behavior. Particularly, when $m$ is sufficiently large, the barrier decreases to zero at a rate $O(m^{-1/2})$. Notably, this rate does not suffer from the curse of dimensionality and demonstrates how substantial permutation can reduce the LMC loss barrier. Moreover, we observe a sharp transition in the sparsity of GD/SGD solutions when increasing the learning rate and investigate how this sparsity preference affects the LMC loss barrier modulo permutation. Experiments on both synthetic and MNIST datasets corroborate our theoretical predictions and reveal a similar trend for more complex network architectures.

Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

TL;DR

independent of input dimension, and reveals a peak-barrier and a double-descent pattern as the student width

increases, with a minimum barrier near

. The study also uncovers a learning-rate–driven sparsity transition in GD/SGD solutions, which improves permutation matching and further reduces the barrier, with empirical support on synthetic data, MNIST, and deeper architectures. These results illuminate how width, sparsity, and permutation interact to shape loss landscapes and have implications for model merging and ensemble methods.

Abstract

increases, the LMC loss barrier modulo permutation exhibits a double descent behavior. Particularly, when

is sufficiently large, the barrier decreases to zero at a rate

. Notably, this rate does not suffer from the curse of dimensionality and demonstrates how substantial permutation can reduce the LMC loss barrier. Moreover, we observe a sharp transition in the sparsity of GD/SGD solutions when increasing the learning rate and investigate how this sparsity preference affects the LMC loss barrier modulo permutation. Experiments on both synthetic and MNIST datasets corroborate our theoretical predictions and reveal a similar trend for more complex network architectures.

Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

TL;DR

Abstract

Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (25)

Theorems & Definitions (6)