Diffusion Generative Modeling on Lie Group Representations
Marco Bertolini, Tuan Le, Djork-Arné Clevert
TL;DR
Diffusion models on non-Euclidean data often struggle to respect Lie-group symmetries. This work introduces a generalized score-matching framework that performs diffusion directly in Lie group representations acting on a data space $X$, yielding exact forward and reverse SDEs whose flows decompose along the Lie algebra $\mathfrak{g}$. The forward process applies infinitesimal group transformations, while the reverse process learns a generalized score $\boldsymbol{\mathcal{L}} \log p_t$ to generate samples, recovering standard score matching when $G$ is Abelian. Empirically, the approach improves molecular conformer generation on $SO(3)\times\mathbb{R}_+$ and ligand docking on $SE(3)$ relative to Riemannian diffusion baselines, and reveals dimensionality benefits by choosing $G$ aligned with data structure. Together, the framework unifies diffusion on curved manifolds with flat-space diffusion without manifold projections and opens new avenues for symmetry-aware generative modeling in chemistry and physics.
Abstract
We introduce a novel class of score-based diffusion processes that operate directly in the representation space of Lie groups. Leveraging the framework of Generalized Score Matching, we derive a class of Langevin dynamics that decomposes as a direct sum of Lie algebra representations, enabling the modeling of any target distribution on any (non-Abelian) Lie group. Standard score-matching emerges as a special case of our framework when the Lie group is the translation group. We prove that our generalized generative processes arise as solutions to a new class of paired stochastic differential equations (SDEs), introduced here for the first time. We validate our approach through experiments on diverse data types, demonstrating its effectiveness in real-world applications such as SO(3)-guided molecular conformer generation and modeling ligand-specific global SE(3) transformations for molecular docking, showing improvement in comparison to Riemannian diffusion on the group itself. We show that an appropriate choice of Lie group enhances learning efficiency by reducing the effective dimensionality of the trajectory space and enables the modeling of transitions between complex data distributions.
