Table of Contents
Fetching ...

Symmetry & Critical Points

Yossi Arjevani

TL;DR

This work develops a geometric mechanism for symmetry breaking in invariant nonconvex optimization by analyzing tangency sets $mho_{oldsymbol{c}}(f)$ emanating from symmetric critical points. It combines differential topology, jet transversality, and o-minimal definability to show that generically the tangency set consists of 1D arcs along which SB occurs, with connected critical points inheriting symmetry constraints. The authors exploit group representation theory to reduce Hessian and higher-order derivative analyses via isotypic decompositions, enabling tractable spectral characterizations and invariant-tensor computations, especially for permutation representations like $(oldsymbol{R}^d,S_d)$ and $(M(d,d),S_d)$. They also establish finite-determinacy and $ heta$-sufficiency of jets to reduce study to finite models, and discuss deep implications for neural network loss landscapes, including SB-driven annihilation of minima as network width grows. Together, these results provide a rigorous framework for understanding and predicting the structure and tractability of invariant nonconvex optimization problems with broad relevance to learning theory and tensor decompositions.

Abstract

Critical points of an invariant function may or may not be symmetric. We prove, however, that if a symmetric critical point exists, those adjacent to it are generically symmetry breaking. This mathematical mechanism is shown to carry important implications for our ability to efficiently minimize invariant nonconvex functions, in particular those associated with neural networks.

Symmetry & Critical Points

TL;DR

This work develops a geometric mechanism for symmetry breaking in invariant nonconvex optimization by analyzing tangency sets emanating from symmetric critical points. It combines differential topology, jet transversality, and o-minimal definability to show that generically the tangency set consists of 1D arcs along which SB occurs, with connected critical points inheriting symmetry constraints. The authors exploit group representation theory to reduce Hessian and higher-order derivative analyses via isotypic decompositions, enabling tractable spectral characterizations and invariant-tensor computations, especially for permutation representations like and . They also establish finite-determinacy and -sufficiency of jets to reduce study to finite models, and discuss deep implications for neural network loss landscapes, including SB-driven annihilation of minima as network width grows. Together, these results provide a rigorous framework for understanding and predicting the structure and tractability of invariant nonconvex optimization problems with broad relevance to learning theory and tensor decompositions.

Abstract

Critical points of an invariant function may or may not be symmetric. We prove, however, that if a symmetric critical point exists, those adjacent to it are generically symmetry breaking. This mathematical mechanism is shown to carry important implications for our ability to efficiently minimize invariant nonconvex functions, in particular those associated with neural networks.
Paper Structure (32 sections, 28 theorems, 58 equations, 4 figures)

This paper contains 32 sections, 28 theorems, 58 equations, 4 figures.

Key Result

Theorem 1

Generically, the tangency set of a smooth function on $\mathbb{R}^d$ consists of $2d$ tangency arcs, each tangential to a Hessian eigenspace.

Figures (4)

  • Figure 1: Tangency sets of $h$ defined in (\ref{['eq:h']}) relative to different critical points $\mathbf{c}_{\bullet}$. The tangency arcs break the symmetry of $\mathbf{c}_{\bullet}$ to a varying degree. Critical points connected by tangency arcs are at least as symmetric as the arcs and so are themselves symmetry breaking.
  • Figure 2: Critical points connected through tangency arcs to a symmetric point, here a circle, break the symmetry by a process resembling (simultaneous) spontaneous symmetry breaking.
  • Figure 3: The Hessian spectrum at global and spurious $\Delta (S_{d-p} \times S_p)$-minima of ${\mathcal{L}}_{\text{ReLU}}$, $p\in \{0,1,2,3\}$, given to $O(d^{-\frac{1}{2}})$-terms arjevani2021analytic. Eigenvalues having large multiplicity concentrate near zero and account for all but $\Theta(d)$ eigenvalues that grow linearly in $d$. Upon convergence, the spectral density is expected to accumulate in clusters whose number does not depend on $d$.
  • Figure 4: (Left) an histogram representing the analytic estimates of the Hessian spectrum given in table: skew arjevani2021analytic. By fundamental results from group representation theory, symmetry breaking critical points necessarily have highly skewed Hessian spectrum. (Right) numerical approximation of the Hessian spectrum for VGG11 trained on various datasets papyan2018full. In arjevani2024dsb, the mechanism developed in this work is used to examine the extent to which SB phenomena account for the highly skewed Hessian spectrum observed in practice.

Theorems & Definitions (46)

  • Definition 2.1: Tangency sets and arcs
  • Theorem : Informal, no symmetry
  • Theorem
  • Theorem
  • Definition 4.1
  • Definition 4.2
  • Remark 4.3
  • Proposition 4.4
  • Theorem 4.5
  • Definition 4.6: Notation and assumptions as above
  • ...and 36 more