Table of Contents
Fetching ...

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

TL;DR

This work tackles diffusion-based generative modeling for data on Lie groups by introducing Trivialized Diffusion Model (TDM), which uses left-trivialization to keep momentum in the fixed Lie algebra $\mathfrak{g}$ and learn a score in a Euclidean space. The forward process on a Lie group $G$ follows $\dot{g_t}=T_eL_{g_t}\xi_t$ with $\mathrm{d}\xi_t = -\gamma(t)\xi_tdt + \sqrt{2\gamma(t)}\mathrm{d}W_t^{\mathfrak{g}}$, and the time-reversed backward process uses $\nabla_{\xi}\log p_{T-t}(g_t,\xi_t)$, enabling efficient, projection-free sampling via an Operator Splitting Integrator (OSI). Likelihood training employs DSM or ISM, with explicit conditional transitions for Abelian bases like $\mathsf{SO}(2)$ or $\mathbb{T}$, and a fixed Euclidean score network $s_{\theta}$ acting in $\mathfrak{g}$. Empirically, TDM achieves state-of-the-art results on protein/RNA torsion angles, challenging torus datasets (Pacman, checkerboard), and scales to high-dimensional $\mathsf{SO}(n)$ and $\mathsf{U}(n)$ data, including quantum time-evolution operators, with code publicly available. This advances scalable, high-fidelity generative modeling on manifolds by avoiding the typical manifold-projection errors that plague prior approaches.

Abstract

The generative modeling of data on manifolds is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates a new momentum variable that stays in a simple fixed vector space. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency. The resulting method achieves state-of-the-art performance on protein and RNA torsion angle generation and sophisticated torus datasets. We also, arguably for the first time, tackle the generation of data on high-dimensional Special Orthogonal and Unitary groups, the latter essential for quantum problems. Code is available at https://github.com/yuchen-zhu-zyc/TDM.

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

TL;DR

This work tackles diffusion-based generative modeling for data on Lie groups by introducing Trivialized Diffusion Model (TDM), which uses left-trivialization to keep momentum in the fixed Lie algebra and learn a score in a Euclidean space. The forward process on a Lie group follows with , and the time-reversed backward process uses , enabling efficient, projection-free sampling via an Operator Splitting Integrator (OSI). Likelihood training employs DSM or ISM, with explicit conditional transitions for Abelian bases like or , and a fixed Euclidean score network acting in . Empirically, TDM achieves state-of-the-art results on protein/RNA torsion angles, challenging torus datasets (Pacman, checkerboard), and scales to high-dimensional and data, including quantum time-evolution operators, with code publicly available. This advances scalable, high-fidelity generative modeling on manifolds by avoiding the typical manifold-projection errors that plague prior approaches.

Abstract

The generative modeling of data on manifolds is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates a new momentum variable that stays in a simple fixed vector space. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency. The resulting method achieves state-of-the-art performance on protein and RNA torsion angle generation and sophisticated torus datasets. We also, arguably for the first time, tackle the generation of data on high-dimensional Special Orthogonal and Unitary groups, the latter essential for quantum problems. Code is available at https://github.com/yuchen-zhu-zyc/TDM.
Paper Structure (26 sections, 13 theorems, 81 equations, 11 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 13 theorems, 81 equations, 11 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

Let $T \geq 0$, $W^{\mathfrak{g}}_t$ be a Brownian motion on the Lie algebra $\mathfrak{g}$. Let $\mathbf{X_t} = (g_t, \xi_t)$ be the trajectory of the forward dynamics eq:forwarddyn_simple, with $\mathbf{X}_t$ admitting a smooth density $p_{t}(g_t, \xi_t)$ with respect to the Haar measure on $G$ an satisfy $\mathbf{Y}_t \stackrel{d}{=} (\mathbf{X}_{T-t})$ under the notation $\mathbf{Y}_t:= (g_t,

Figures (11)

  • Figure 1: Visualization of algorithmic intuition of TDM. Existing approaches such as RFM and RSGM often model an object that lies on changing tangent spaces as the position $g_t$ moves, resulting in inaccuracies when handling complicated manifolds during trajectory simulations. In contrast, TDM only needs to learn the score in simple Euclidean space. Thanks to the special structure of trivialization, TDM guarantees the induced momentum will strictly lie on the tangent space, which improves generation quality and reduces sampling error.
  • Figure 2: Log likelihood $(\uparrow)$ v.s. Dimensions.
  • Figure 3: Visualization of Generated data by TDM on $4 \times 4$ and $6 \times 6$ checkerboard.
  • Figure 4: Visualization of Pacman dataset and generated data by TDM on $\mathbb{T}^2$. Pacman maze corresponds to a random variable on $\mathbb{T}^2$ with a complicated distribution corresponding to locations where there is a wall.
  • Figure 5: Log likelihood and visualization of generated data for $\mathsf{SO}(3)$ with $32$ mixture components.
  • ...and 6 more figures

Theorems & Definitions (21)

  • Theorem 1: Time Reversal of Trivialized Kinetic Langevin on Lie Group
  • Remark 1: Probability Flow ODE
  • Theorem 2: Conditional transition probability for Abelian Lie Group
  • Lemma 1
  • proof : Proof of Lemma \ref{['lemma_L_star']}
  • proof : Proof of Theorem \ref{['thm:time-reversal']}
  • Corollary 1: Conditional transition probability for Abelian Lie Group
  • Lemma 2
  • proof : Proof of Lemma \ref{['lem:joint_dist']}
  • Corollary 2
  • ...and 11 more