Table of Contents
Fetching ...

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré

TL;DR

This work extends the notion of conservation laws from Euclidean gradient flows to momentum-based dynamics and non-Euclidean metrics, revealing that momentum induces time-dependent conserved quantities unlike their gradient-flow counterparts. By formulating a phase-space lifting framework and leveraging invariances via Noetherian reasoning, the authors characterize conserved functions and establish how their set changes when moving from gradient to momentum dynamics across linear and nonlinear networks. They prove a structure theorem for momentum flows, show a systematic conservation-loss in MF versus GF, and provide complete or near-complete characterizations for several architectures (e.g., linear Euclidean MF, ReLU, NMF, ICNN) with natural-gradient considerations. The findings illuminate fundamental limits on invariant quantities during optimization, with implications for understanding training dynamics, generalization, and the design of momentum-accelerated or mirror-descent-based algorithms. The work also supplies computational tools to compute conservation laws and demonstrates the theory on PCA, MLP, NMF, and ICNN settings, offering a principled lens to study optimization in non-Euclidean geometries.

Abstract

Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

TL;DR

This work extends the notion of conservation laws from Euclidean gradient flows to momentum-based dynamics and non-Euclidean metrics, revealing that momentum induces time-dependent conserved quantities unlike their gradient-flow counterparts. By formulating a phase-space lifting framework and leveraging invariances via Noetherian reasoning, the authors characterize conserved functions and establish how their set changes when moving from gradient to momentum dynamics across linear and nonlinear networks. They prove a structure theorem for momentum flows, show a systematic conservation-loss in MF versus GF, and provide complete or near-complete characterizations for several architectures (e.g., linear Euclidean MF, ReLU, NMF, ICNN) with natural-gradient considerations. The findings illuminate fundamental limits on invariant quantities during optimization, with implications for understanding training dynamics, generalization, and the design of momentum-accelerated or mirror-descent-based algorithms. The work also supplies computational tools to compute conservation laws and demonstrates the theory on PCA, MLP, NMF, and ICNN settings, offering a principled lens to study optimization in non-Euclidean geometries.

Abstract

Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.
Paper Structure (57 sections, 31 theorems, 110 equations, 4 figures)

This paper contains 57 sections, 31 theorems, 110 equations, 4 figures.

Key Result

Theorem 2.1

Let $h(t, \theta)$ (resp. $h(t, \theta, \dot \theta)$) be a conserved function for the ODE gradientflow (resp. momentumflow with $\tau(t) = \tau$) when its right-hand side is zero. For all $t$ and $\theta$, one has $h(t, \theta) = h(0, \theta)$ (resp. $h(t, \theta, \dot \theta) = {H}(\theta + \frac{

Figures (4)

  • Figure 1: Impact on the step size $\delta$ on the evolution of the loss (left) and on the preservation of one of the conservation laws (right). The colors are associated with the number of iterations $t_{\max}/\delta$ used to train the networks.
  • Figure 2: Impact on the momentum parameter $\mu=1/\tau$ on the evolution of the loss (left) and on the preservation of one of the conservation laws (right).
  • Figure 3: Left: example of input images (columns of $Y$). Right: example of NMF factors (columns of $U$) at optimality.
  • Figure 4: Impact on the momentum parameter $\mu=1/\tau$ on the evolution of the loss (left) and on the preservation of one of the conservation laws (right). Here $\mu=0$ corresponds to the gradient flow (no momentum).

Theorems & Definitions (63)

  • Theorem 2.1: Structure theorem
  • Definition 2.2: Conservation through a flow
  • Definition 2.3: Conservation during the flow \ref{['momentumflow']} with a given dataset
  • Definition 2.4: Conservation during the flow \ref{['momentumflow']} with "any" dataset
  • Definition 2.5
  • Example 2.6
  • Proposition 2.6: Smooth functions conserved through a given flow
  • Proposition 2.6
  • Proposition 2.6
  • Remark 2.7
  • ...and 53 more