Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

Zhanran Lin; Puheng Li; Lei Wu

Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

Zhanran Lin, Puheng Li, Lei Wu

TL;DR

This work analyzes the geometry of over-parameterized neural network loss landscapes, focusing on mode connectivity, star-shaped connectivity, and geodesic connectivity. It establishes that two-layer ReLU networks and linear networks admit $2$-piece linear connections between typical minima under sufficient width, with broader $k$-PL guarantees and a star-center structure that connects multiple minima via simple paths. The normalized geodesic distance between minima is shown to approach the Euclidean distance as width grows, and neuron sparsity induced by SGD helps drive NGD toward unity, indicating a landscape closer to convex. Empirical validation on MNIST and CIFAR-10 corroborates the theoretical findings, demonstrating practically barrier-free fold-lines through a central minimum and near-1 NGD for wide networks.

Abstract

One of the most intriguing findings in the structure of neural network landscape is the phenomenon of mode connectivity: For two typical global minima, there exists a path connecting them without barrier. This concept of mode connectivity has played a crucial role in understanding important phenomena in deep learning. In this paper, we conduct a fine-grained analysis of this connectivity phenomenon. First, we demonstrate that in the overparameterized case, the connecting path can be as simple as a two-piece linear path, and the path length can be nearly equal to the Euclidean distance. This finding suggests that the landscape should be nearly convex in a certain sense. Second, we uncover a surprising star-shaped connectivity: For a finite number of typical minima, there exists a center on minima manifold that connects all of them simultaneously via linear paths. These results are provably valid for linear networks and two-layer ReLU networks under a teacher-student setup, and are empirically supported by models trained on MNIST and CIFAR-10.

Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

TL;DR

-piece linear connections between typical minima under sufficient width, with broader

-PL guarantees and a star-center structure that connects multiple minima via simple paths. The normalized geodesic distance between minima is shown to approach the Euclidean distance as width grows, and neuron sparsity induced by SGD helps drive NGD toward unity, indicating a landscape closer to convex. Empirical validation on MNIST and CIFAR-10 corroborates the theoretical findings, demonstrating practically barrier-free fold-lines through a central minimum and near-1 NGD for wide networks.

Abstract

Paper Structure (22 sections, 17 theorems, 62 equations, 5 figures, 2 tables)

This paper contains 22 sections, 17 theorems, 62 equations, 5 figures, 2 tables.

Introduction
Related works
Preliminaries
Two-layer ReLU networks
The $k$-piece linear connectivity
Star-shaped connectivity
The geodesic connectivity
Linear networks
Experiments
Star-shaped connectivity
The geodesic connectivity
Conclusion
Proofs in Section \ref{['sec: 2lnn']}
Proof of Theorem \ref{['thm: 2lnn-minima']}.
Proof of Theorem \ref{['thm: 2pl-2lnn']}.
...and 7 more sections

Key Result

Theorem 6

Suppose that $m \geq M$ and Assumption assumption: 2lnn hold. Let $S_0 = \{(0,\ldots,0) \in \mathbb{R}^d\}$, $S_j = \{\alpha \mathrm{\mathbf{e}}_j: \alpha\neq 0\}$ for $j\in [M]$, and $S=\cup_{j=0}^M S_j$. Then the global minima manifold $\mathcal{M}$ is a compact set in $\mathbb{R}^{m\times d}$:

Figures (5)

Figure 1: Left: The speculation of a potential shape of the star-shaped connectivity in the loss landscape. Due to the limitation in $2$-dimensional visualization, here we only provide a potential section as a heuristic plot. Right: For $2$ minima $\theta_1,\theta_2$ as described in the setting of Proposition \ref{['thm: linearnet-2pl']}, we consider the linear mode connectivity through a center $\theta^*$. For the linear interpolations between two minima, and the $\theta_1 \rightarrow \theta^* \rightarrow \theta_2$ fold-lines constructed by two linear interpolations, we plot the training loss along these paths. Specifically, the $x$-axis $t$ here denotes the point $t{\theta^*}+(1-t)\theta_i$ in the linear interpolation (the orange line). On the other hand, for the loss along the fold-line (blue line), $t<0.5$ corresponds to the point $2t{\theta^*}+(1-2t)\theta_1$, while $t \ge 0.5$ corresponds to $(2t-1)\theta_2 + (2-2t){\theta^*}$. The result shows our expectation of linear mode connectivity through the center we obtained.
Figure 2: Left. The original star-shaped connectivity. The five white circles are the feet and the red circle is the center. The blue line represents the linear connecting path. Right. The extended star-shaped connectivity is proved in Theorem \ref{['1.8']}, where the feet are connected to the center via a two-piece linear path.
Figure 3: Left. How the normalized geodesic distance (NGD) changes with the network width for two-layer ReLU networks. The teacher network has $M=4$ neurons and we refer to Section \ref{['exp']} for the algorithm of estimating NGD. Right. The $L^2$ norm of each neuron for SGD solutions. Here, $m=512$, $M=4$, $d=4$. One can see that SGD tends to find sparse solutions.
Figure 4: Normalized geodesic distance vs. network width for linear networks. Following the setting as described earlier in this section, we consider a fully connected linear network with $L = 2$. We set $d=m$, and vary $m$ to consider the normalized geodesic distance of a center with $2$-PL-connectivity. Algorithm \ref{['algo1']} is applied here to train a center and the result is an average of $5$ separate experiments. It is shown that as the width increases, we can obtain a center that satisfies $2$-PL-connectivity with a shorter geodesic distance.
Figure 5: An validation of star-shaped connectivity. The model is VGG16 and the dataset is CIFAR-10. We examine $3$ minima obtained by running Adam independently. Then we applied the center-finding algorithm to obtain the corresponding center. For all the $3$ linear interpolations between minima, and all the $3$ "minimum-center-minimum" fold-lines constructed by two linear interpolations, we plot the training loss (left) and accuracy (right) along these paths. Specifically, the $x$-axis $t$ here denotes the point $t\theta^*+(1-t)\theta_i$ in the linear interpolation (the orange line). On the other hand, for a pair $(\theta_i,\theta_j)$ (blue line), $t<0.5$ corresponds to the point $2t\theta^*+(1-2t)\theta_i$, while $t \ge 0.5$ corresponds to $(2t-1)\theta_j + (2-2t)\theta^*$. It is shown in the experiment that our algorithm successfully found a center that is linearly connected to all three minima simultaneously, i.e., forms a star-shaped connectivity.

Theorems & Definitions (27)

Definition 1: Linear interpolation
Definition 2: $k$-piece linear connectivity
Definition 3: Star-shaped linear connectivity
Definition 4: Normalized geodesic distance (NGD)
Theorem 6
Lemma 7: Linear connectivity
Theorem 8
Theorem 9
Theorem 10
Theorem 11
...and 17 more

Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

TL;DR

Abstract

Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (27)