Symmetry & Critical Points

Yossi Arjevani

Symmetry & Critical Points

Yossi Arjevani

TL;DR

This work develops a geometric mechanism for symmetry breaking in invariant nonconvex optimization by analyzing tangency sets $mho_{oldsymbol{c}}(f)$ emanating from symmetric critical points. It combines differential topology, jet transversality, and o-minimal definability to show that generically the tangency set consists of 1D arcs along which SB occurs, with connected critical points inheriting symmetry constraints. The authors exploit group representation theory to reduce Hessian and higher-order derivative analyses via isotypic decompositions, enabling tractable spectral characterizations and invariant-tensor computations, especially for permutation representations like $(oldsymbol{R}^d,S_d)$ and $(M(d,d),S_d)$. They also establish finite-determinacy and $ heta$-sufficiency of jets to reduce study to finite models, and discuss deep implications for neural network loss landscapes, including SB-driven annihilation of minima as network width grows. Together, these results provide a rigorous framework for understanding and predicting the structure and tractability of invariant nonconvex optimization problems with broad relevance to learning theory and tensor decompositions.

Abstract

Critical points of an invariant function may or may not be symmetric. We prove, however, that if a symmetric critical point exists, those adjacent to it are generically symmetry breaking. This mathematical mechanism is shown to carry important implications for our ability to efficiently minimize invariant nonconvex functions, in particular those associated with neural networks.

Symmetry & Critical Points

TL;DR

This work develops a geometric mechanism for symmetry breaking in invariant nonconvex optimization by analyzing tangency sets

emanating from symmetric critical points. It combines differential topology, jet transversality, and o-minimal definability to show that generically the tangency set consists of 1D arcs along which SB occurs, with connected critical points inheriting symmetry constraints. The authors exploit group representation theory to reduce Hessian and higher-order derivative analyses via isotypic decompositions, enabling tractable spectral characterizations and invariant-tensor computations, especially for permutation representations like

and

. They also establish finite-determinacy and

-sufficiency of jets to reduce study to finite models, and discuss deep implications for neural network loss landscapes, including SB-driven annihilation of minima as network width grows. Together, these results provide a rigorous framework for understanding and predicting the structure and tractability of invariant nonconvex optimization problems with broad relevance to learning theory and tensor decompositions.

Abstract

Paper Structure (32 sections, 28 theorems, 58 equations, 4 figures)

This paper contains 32 sections, 28 theorems, 58 equations, 4 figures.

Introduction
Main results: symmetry breaking critical points
Applications
Hessian, Gauss-Newton and group representation theory
Symmetric tensor norms
Annihilation of minima and $\chi$- sufficiency of jets
Stabilizing inner products and families of critical points
Deep symmetry breaking
The tangency set
Preliminaries
Transversality theory
Transversality and stratification
O-minimal theory
Regularity and jet transversality
Finite determinacy
...and 17 more sections

Key Result

Theorem 1

Generically, the tangency set of a smooth function on $\mathbb{R}^d$ consists of $2d$ tangency arcs, each tangential to a Hessian eigenspace.

Figures (4)

Figure 1: Tangency sets of $h$ defined in (\ref{['eq:h']}) relative to different critical points $\mathbf{c}_{\bullet}$. The tangency arcs break the symmetry of $\mathbf{c}_{\bullet}$ to a varying degree. Critical points connected by tangency arcs are at least as symmetric as the arcs and so are themselves symmetry breaking.
Figure 2: Critical points connected through tangency arcs to a symmetric point, here a circle, break the symmetry by a process resembling (simultaneous) spontaneous symmetry breaking.
Figure 3: The Hessian spectrum at global and spurious $\Delta (S_{d-p} \times S_p)$-minima of ${\mathcal{L}}_{\text{ReLU}}$, $p\in \{0,1,2,3\}$, given to $O(d^{-\frac{1}{2}})$-terms arjevani2021analytic. Eigenvalues having large multiplicity concentrate near zero and account for all but $\Theta(d)$ eigenvalues that grow linearly in $d$. Upon convergence, the spectral density is expected to accumulate in clusters whose number does not depend on $d$.
Figure 4: (Left) an histogram representing the analytic estimates of the Hessian spectrum given in table: skew arjevani2021analytic. By fundamental results from group representation theory, symmetry breaking critical points necessarily have highly skewed Hessian spectrum. (Right) numerical approximation of the Hessian spectrum for VGG11 trained on various datasets papyan2018full. In arjevani2024dsb, the mechanism developed in this work is used to examine the extent to which SB phenomena account for the highly skewed Hessian spectrum observed in practice.

Theorems & Definitions (46)

Definition 2.1: Tangency sets and arcs
Theorem : Informal, no symmetry
Theorem
Theorem
Definition 4.1
Definition 4.2
Remark 4.3
Proposition 4.4
Theorem 4.5
Definition 4.6: Notation and assumptions as above
...and 36 more

Symmetry & Critical Points

TL;DR

Abstract

Symmetry & Critical Points

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (46)