Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
TL;DR
This work extends the notion of conservation laws from Euclidean gradient flows to momentum-based dynamics and non-Euclidean metrics, revealing that momentum induces time-dependent conserved quantities unlike their gradient-flow counterparts. By formulating a phase-space lifting framework and leveraging invariances via Noetherian reasoning, the authors characterize conserved functions and establish how their set changes when moving from gradient to momentum dynamics across linear and nonlinear networks. They prove a structure theorem for momentum flows, show a systematic conservation-loss in MF versus GF, and provide complete or near-complete characterizations for several architectures (e.g., linear Euclidean MF, ReLU, NMF, ICNN) with natural-gradient considerations. The findings illuminate fundamental limits on invariant quantities during optimization, with implications for understanding training dynamics, generalization, and the design of momentum-accelerated or mirror-descent-based algorithms. The work also supplies computational tools to compute conservation laws and demonstrates the theory on PCA, MLP, NMF, and ICNN settings, offering a principled lens to study optimization in non-Euclidean geometries.
Abstract
Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.
