Table of Contents
Fetching ...

On the implicit regularization of Langevin dynamics with projected noise

Govind Menon, Austin J. Stromme, Adrien Vacher

TL;DR

We study Langevin dynamics with diffusion projected onto directions orthogonal to a compact isometric group action $G\subset O(d)$ to model symmetry-induced over-parameterization. The main result proves that, starting from a $G$-invariant distribution, the projected-noise SDE is equivalent in law to an isotropic Langevin SDE with an additional drift $- (\alpha^2-\beta^2)\nabla \log \mathrm{vol}\,\mathcal{O}_{Y_t}$, i.e. a drift by the mean curvature $H(x)=-\nabla \log \mathrm{vol}\,\mathcal{O}_x$. This drift biases trajectories toward orbits of smaller embedded volume, revealing a geometry-driven implicit regularization tied to the group action. The authors construct a coupling via a process on $G$ to relate the two dynamics, analyze concrete group actions (radial $SO(d)$, eigenvalue-conjugation, and Bures–Wasserstein-type actions), and provide a PDE-based alternative proof, illustrating how model symmetries induce bias through differential volume/curvature terms with potential implications for architecture-aware learning dynamics.

Abstract

We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new light on the effects of symmetry on stochastic gradient descent for over-parametrized models. Our main result identifies a novel form of implicit regularization: when the initial and target density are both invariant under the group action, Langevin dynamics with projected noise is equivalent in law to Langevin dynamics with isotropic diffusion but with an additional drift term proportional to the negative log volume of the group orbit. We prove this result by constructing a coupling of the two processes via a third process on the group itself, and identify the additional drift as the mean curvature of the orbits.

On the implicit regularization of Langevin dynamics with projected noise

TL;DR

We study Langevin dynamics with diffusion projected onto directions orthogonal to a compact isometric group action to model symmetry-induced over-parameterization. The main result proves that, starting from a -invariant distribution, the projected-noise SDE is equivalent in law to an isotropic Langevin SDE with an additional drift , i.e. a drift by the mean curvature . This drift biases trajectories toward orbits of smaller embedded volume, revealing a geometry-driven implicit regularization tied to the group action. The authors construct a coupling via a process on to relate the two dynamics, analyze concrete group actions (radial , eigenvalue-conjugation, and Bures–Wasserstein-type actions), and provide a PDE-based alternative proof, illustrating how model symmetries induce bias through differential volume/curvature terms with potential implications for architecture-aware learning dynamics.

Abstract

We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new light on the effects of symmetry on stochastic gradient descent for over-parametrized models. Our main result identifies a novel form of implicit regularization: when the initial and target density are both invariant under the group action, Langevin dynamics with projected noise is equivalent in law to Langevin dynamics with isotropic diffusion but with an additional drift term proportional to the negative log volume of the group orbit. We prove this result by constructing a coupling of the two processes via a third process on the group itself, and identify the additional drift as the mean curvature of the orbits.
Paper Structure (28 sections, 15 theorems, 97 equations, 1 figure)

This paper contains 28 sections, 15 theorems, 97 equations, 1 figure.

Key Result

Proposition 1

Suppose $x \in \mathbb{R}^d_{\mathrm{reg}}$. Then there is a smooth orthonormal frame $V_1, \ldots, V_m$, defined on a neighborhood of $x$, such that for each $y \in U$, $V_1, \ldots, V_m$ spans $T_y\mathcal{O}_y$. In particular, $P$ and $Q$ are smooth in a neighborhood of $x$, and $\mathbb{R}^d_{\m

Figures (1)

  • Figure 1: By introducing an appropriate process $g_t \in G$, we can create additional movement in the $T_x \mathcal{O}_x$ directions without changing the $G$-invariance property, and thus the marginal distributions.

Theorems & Definitions (33)

  • Proposition 1: Smoothness of $P$ and $Q$ away from singular orbits
  • Proposition 2: Smoothness and gradient of the log-volume
  • Theorem 3: Main result
  • Example 4: Radial symmetries
  • Example 5: Projection onto eigenvalues
  • Example 6: Bures--Wasserstein case
  • Corollary 7: Identity diffusion
  • Lemma 8: SDE remains $G$-invariant
  • Theorem 9: Brownian motion on an embedded manifold
  • Lemma 10: Relationship between second fundamental forms
  • ...and 23 more