Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances
Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea
TL;DR
This work deciphers the loss-landscape geometry of overparameterized neural networks by exploiting permutation symmetries. It introduces expansion manifolds that connect discrete global minima into a single zero-loss manifold and characterizes symmetry-induced critical points as affine subspaces generated by replications and permutations, with precise counts G(r,m) and T(r,m). The authors derive closed-form formulas and asymptotics showing saddles dominate in mildly overparameterized regimes, while global minima dominate in vastly overparameterized regimes, with depth intensifying these effects. The results give a principled view of optimization dynamics and provide a basis for pruning strategies that exploit replicated units. Overall, the paper advances understanding of how overparameterization and symmetries shape non-convex optimization in neural networks.
Abstract
We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another. For a network of width $m$, we identify the number $G(r,m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r<r^*$. Via a combinatorial analysis, we derive closed-form formulas for $ T $ and $ G $ and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small $ h $) and vice versa in the vastly overparameterized regime ($h \gg r^*$). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.
