Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

Sibylle Marcotte; Rémi Gribonval; Gabriel Peyré

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré

TL;DR

This work extends the notion of conservation laws from Euclidean gradient flows to momentum-based dynamics and non-Euclidean metrics, revealing that momentum induces time-dependent conserved quantities unlike their gradient-flow counterparts. By formulating a phase-space lifting framework and leveraging invariances via Noetherian reasoning, the authors characterize conserved functions and establish how their set changes when moving from gradient to momentum dynamics across linear and nonlinear networks. They prove a structure theorem for momentum flows, show a systematic conservation-loss in MF versus GF, and provide complete or near-complete characterizations for several architectures (e.g., linear Euclidean MF, ReLU, NMF, ICNN) with natural-gradient considerations. The findings illuminate fundamental limits on invariant quantities during optimization, with implications for understanding training dynamics, generalization, and the design of momentum-accelerated or mirror-descent-based algorithms. The work also supplies computational tools to compute conservation laws and demonstrates the theory on PCA, MLP, NMF, and ICNN settings, offering a principled lens to study optimization in non-Euclidean geometries.

Abstract

Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

TL;DR

Abstract

Paper Structure (57 sections, 31 theorems, 110 equations, 4 figures)

This paper contains 57 sections, 31 theorems, 110 equations, 4 figures.

Introduction
Conservation laws of Euclidean gradient flows
Momentum and non-Euclidean metrics.
Conservation laws and momentum.
Contributions.
Conservation Laws for Momentum Flows
Momentum dynamics
Main Examples
Examples of models.
Example of flows.
Example of metrics.
Running examples.
Conserved functions
Time-dependence: GF vs MF
Formal definition via phase-space lifting
...and 42 more sections

Key Result

Theorem 2.1

Let $h(t, \theta)$ (resp. $h(t, \theta, \dot \theta)$) be a conserved function for the ODE gradientflow (resp. momentumflow with $\tau(t) = \tau$) when its right-hand side is zero. For all $t$ and $\theta$, one has $h(t, \theta) = h(0, \theta)$ (resp. $h(t, \theta, \dot \theta) = {H}(\theta + \frac{

Figures (4)

Figure 1: Impact on the step size $\delta$ on the evolution of the loss (left) and on the preservation of one of the conservation laws (right). The colors are associated with the number of iterations $t_{\max}/\delta$ used to train the networks.
Figure 2: Impact on the momentum parameter $\mu=1/\tau$ on the evolution of the loss (left) and on the preservation of one of the conservation laws (right).
Figure 3: Left: example of input images (columns of $Y$). Right: example of NMF factors (columns of $U$) at optimality.
Figure 4: Impact on the momentum parameter $\mu=1/\tau$ on the evolution of the loss (left) and on the preservation of one of the conservation laws (right). Here $\mu=0$ corresponds to the gradient flow (no momentum).

Theorems & Definitions (63)

Theorem 2.1: Structure theorem
Definition 2.2: Conservation through a flow
Definition 2.3: Conservation during the flow \ref{['momentumflow']} with a given dataset
Definition 2.4: Conservation during the flow \ref{['momentumflow']} with "any" dataset
Definition 2.5
Example 2.6
Proposition 2.6: Smooth functions conserved through a given flow
Proposition 2.6
Proposition 2.6
Remark 2.7
...and 53 more

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

TL;DR

Abstract

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (63)