Table of Contents
Fetching ...

Understanding Mode Connectivity via Parameter Space Symmetry

Bo Zhao, Nima Dehmamy, Robin Walters, Rose Yu

TL;DR

This work addresses why neural network minima often exhibit mode connectivity by linking the topology of loss minima to parameter-space symmetries. It develops a symmetry-based framework that relates continuous group actions to the connected components of minima, derives exact component counts for full-rank linear networks with and without skip connections, and constructs explicit symmetry-induced curves that connect minima within the same orbit. It also analyzes when linear mode connectivity holds or fails, and provides a curvature-based bound showing that small-curvature symmetry paths imply approximate linear connectivity. Collectively, these results offer a principled way to exploit symmetry for model merging, ensembling, and targeted fine-tuning, and they highlight continuous symmetries as a major structural factor shaping neural loss landscapes.

Abstract

Neural network minima are often connected by curves along which train and test loss remain nearly constant, a phenomenon known as mode connectivity. While this property has enabled applications such as model merging and fine-tuning, its theoretical explanation remains unclear. We propose a new approach to exploring the connectedness of minima using parameter space symmetry. By linking the topology of symmetry groups to that of the minima, we derive the number of connected components of the minima of linear networks and show that skip connections reduce this number. We then examine when mode connectivity and linear mode connectivity hold or fail, using parameter symmetries which account for a significant part of the minimum. Finally, we provide explicit expressions for connecting curves in the minima induced by symmetry. Using the curvature of these curves, we derive conditions under which linear mode connectivity approximately holds. Our findings highlight the role of continuous symmetries in understanding the neural network loss landscape.

Understanding Mode Connectivity via Parameter Space Symmetry

TL;DR

This work addresses why neural network minima often exhibit mode connectivity by linking the topology of loss minima to parameter-space symmetries. It develops a symmetry-based framework that relates continuous group actions to the connected components of minima, derives exact component counts for full-rank linear networks with and without skip connections, and constructs explicit symmetry-induced curves that connect minima within the same orbit. It also analyzes when linear mode connectivity holds or fails, and provides a curvature-based bound showing that small-curvature symmetry paths imply approximate linear connectivity. Collectively, these results offer a principled way to exploit symmetry for model merging, ensembling, and targeted fine-tuning, and they highlight continuous symmetries as a major structural factor shaping neural loss landscapes.

Abstract

Neural network minima are often connected by curves along which train and test loss remain nearly constant, a phenomenon known as mode connectivity. While this property has enabled applications such as model merging and fine-tuning, its theoretical explanation remains unclear. We propose a new approach to exploring the connectedness of minima using parameter space symmetry. By linking the topology of symmetry groups to that of the minima, we derive the number of connected components of the minima of linear networks and show that skip connections reduce this number. We then examine when mode connectivity and linear mode connectivity hold or fail, using parameter symmetries which account for a significant part of the minimum. Finally, we provide explicit expressions for connecting curves in the minima induced by symmetry. Using the curvature of these curves, we derive conditions under which linear mode connectivity approximately holds. Our findings highlight the role of continuous symmetries in understanding the neural network loss landscape.

Paper Structure

This paper contains 31 sections, 28 theorems, 39 equations, 4 figures.

Key Result

Theorem 3.1

Let $X, Y$ be topological spaces and let $f: X \to Y$ be a continuous map. If $X$ is connected, then $f(X)$ is connected.

Figures (4)

  • Figure 1: Minimum of (a) 3-layer linear net $|| Y - W_3 W_2 W_1 X||_2$ and (b) 3-layer linear net with a residual connection $|| Y - W_3 (W_2 W_1 X + X)||_2$, where $X=1$, $Y=1$, and $W_1, W_2, W_3 \in \mathbb{R}$.
  • Figure 2: Interpolation between 2 minima of loss function $L(W_1, W_2) = || Y - W_2 W_1 X||_2$ with 1 dimensional weights. Loss on the interpolation can be unbounded.
  • Figure 3: (a) Empirical validation of Proposition \ref{['prop:barrier-bound']}. (b-c) The loss on the curves induced by approximate symmetries ($\gamma$) remains relatively low, compared to the loss on the linear interpolation between the two ends of these curves. (b) and (c) differ by the magnitude of the group element used. The loss is averaged over 5 random curves.
  • Figure 4: Loss at the middle of the linear interpolation between two minima in a homogeneous network becomes unbounded when the two minima is far apart.

Theorems & Definitions (46)

  • Theorem 3.1: Theorem 4.7 in lee2010introduction
  • Corollary 3.2
  • Corollary 3.3
  • Proposition 3.4
  • Proposition 3.5
  • Corollary 3.6
  • Corollary 3.7
  • Proposition 3.8
  • Proposition 4.1
  • Corollary 4.2
  • ...and 36 more