Table of Contents
Fetching ...

Matching the Optimal Denoiser in Point Cloud Diffusion with (Improved) Rotational Alignment

Ameya Daigavane, YuQing Xie, Bodhi P. Vani, Saeed Saremi, Joseph Kleinhenz, Tess Smidt

TL;DR

This work addresses orientation ambiguity in diffusion-based generation of 3D point clouds by linking the optimal denoiser under rotational augmentation to a matrix Fisher distribution on $SO(3)$. The authors show that alignment to the orientation mode $R^*(y,x)$ is the zeroth-order small-noise approximation of the optimal conditional denoiser, and they derive higher-order corrections $D^*_1$ and $D^*_2$ via Laplace expansions with no extra computational cost. Empirical results indicate that rotation alignment is a robust and often sufficient approximation for the noise regimes used in training, while the proposed higher-order estimators can modestly reduce bias at larger noise levels. This work provides a principled framework for understanding and improving rotationally invariant diffusion on 3D point clouds, with practical implications for molecular conformation modeling and related geometric data tasks.

Abstract

Diffusion models are a popular class of generative models trained to reverse a noising process starting from a target data distribution. Training a diffusion model consists of learning how to denoise noisy samples at different noise levels. When training diffusion models for point clouds such as molecules and proteins, there is often no canonical orientation that can be assigned. To capture this symmetry, the true data samples are often augmented by transforming them with random rotations sampled uniformly over $SO(3)$. Then, the denoised predictions are often rotationally aligned via the Kabsch-Umeyama algorithm to the ground truth samples before computing the loss. However, the effect of this alignment step has not been well studied. Here, we show that the optimal denoiser can be expressed in terms of a matrix Fisher distribution over $SO(3)$. Alignment corresponds to sampling the mode of this distribution, and turns out to be the zeroth order approximation for small noise levels, explaining its effectiveness. We build on this perspective to derive better approximators to the optimal denoiser in the limit of small noise. Our experiments highlight that alignment is often a `good enough' approximation for the noise levels that matter most for training diffusion models.

Matching the Optimal Denoiser in Point Cloud Diffusion with (Improved) Rotational Alignment

TL;DR

This work addresses orientation ambiguity in diffusion-based generation of 3D point clouds by linking the optimal denoiser under rotational augmentation to a matrix Fisher distribution on . The authors show that alignment to the orientation mode is the zeroth-order small-noise approximation of the optimal conditional denoiser, and they derive higher-order corrections and via Laplace expansions with no extra computational cost. Empirical results indicate that rotation alignment is a robust and often sufficient approximation for the noise regimes used in training, while the proposed higher-order estimators can modestly reduce bias at larger noise levels. This work provides a principled framework for understanding and improving rotationally invariant diffusion on 3D point clouds, with practical implications for molecular conformation modeling and related geometric data tasks.

Abstract

Diffusion models are a popular class of generative models trained to reverse a noising process starting from a target data distribution. Training a diffusion model consists of learning how to denoise noisy samples at different noise levels. When training diffusion models for point clouds such as molecules and proteins, there is often no canonical orientation that can be assigned. To capture this symmetry, the true data samples are often augmented by transforming them with random rotations sampled uniformly over . Then, the denoised predictions are often rotationally aligned via the Kabsch-Umeyama algorithm to the ground truth samples before computing the loss. However, the effect of this alignment step has not been well studied. Here, we show that the optimal denoiser can be expressed in terms of a matrix Fisher distribution over . Alignment corresponds to sampling the mode of this distribution, and turns out to be the zeroth order approximation for small noise levels, explaining its effectiveness. We build on this perspective to derive better approximators to the optimal denoiser in the limit of small noise. Our experiments highlight that alignment is often a `good enough' approximation for the noise levels that matter most for training diffusion models.

Paper Structure

This paper contains 27 sections, 82 equations, 7 figures.

Figures (7)

  • Figure 1: Overview of the training process of a denoising diffusion model, represented by $D$. A sample point cloud $x$ is first noised to give $y$. $D$ denoises $y$ to give a new point cloud $D(y; \sigma)$, which gets matched to an estimator $D_{\text{est}}(y; x, \sigma)$ of the optimal denoiser $D^*$. The usual estimator is $D_{\text{est}}(y; x, \sigma) = x$. Here, we show that rotational alignment gives rise to better estimators of $D^*$.
  • Figure 2: A depiction of the unimodal $\mathop{\mathrm{MF}}\nolimits(\mathbf{R}; F)$ over $SO(3)$, highlighting the mode $\mathbf{R}^*(y, x)$. As $\sigma$ decreases, the distribution becomes more peaked around $\mathbf{R}^*(y, x)$.
  • Figure 3: Mean-squared error relative to the optimal denoiser $D^*(y; x)$ as a function of $\sigma$. $x$ here is a randomly chosen conformation of the AEQN tetrapeptide from the Timewarp 4AA-Large dataset, and $y$ is sampled as $x + \sigma \eta$.
  • Figure 4: Training progress for the MLP, as measured by RMSD to ground-truth $x$ (sampled from all frames), when trained using $D_\text{aug}$, $D^*_0$, $D^*_1$ and $D^*_2$.
  • Figure 5: Training progress for the MLP, as measured by RMSD to ground-truth $x$ (fixed as the first frame), when trained using $D_\text{aug}$, $D^*_0$, $D^*_1$ and $D^*_2$.
  • ...and 2 more figures