Table of Contents
Fetching ...

Frame-based Equivariant Diffusion Models for 3D Molecular Generation

Mohan Guo, Cong Liu, Patrick Forré

TL;DR

Frame-based diffusion addresses the challenge of enforcing $\mathbb{E}(3)$ symmetry in molecular generation by decoupling symmetry handling from the backbone. It introduces Global Frame Diffusion, Local Frame Diffusion, and Invariant Frame Diffusion, combined with EdgeDiT backbones, achieving deterministic equivariance while preserving global geometry. On QM9, GFD with EdgeDiT achieves state-of-the-art negative log-likelihood and high atom/molecular stability, with nearly 2x faster sampling than EDM; LFD benefits from a frame-alignment constraint, while IFD sacrifices some diversity but gains efficiency. The work establishes frame-based diffusion as scalable and physically grounded, highlighting the importance of global structure preservation for effective molecular learning.

Abstract

Recent methods for molecular generation face a trade-off: they either enforce strict equivariance with costly architectures or relax it to gain scalability and flexibility. We propose a frame-based diffusion paradigm that achieves deterministic E(3)-equivariance while decoupling symmetry handling from the backbone. Building on this paradigm, we investigate three variants: Global Frame Diffusion (GFD), which assigns a shared molecular frame; Local Frame Diffusion (LFD), which constructs node-specific frames and benefits from additional alignment constraints; and Invariant Frame Diffusion (IFD), which relies on pre-canonicalized invariant representations. To enhance expressivity, we further utilize EdgeDiT, a Diffusion Transformer with edge-aware attention. On the QM9 dataset, GFD with EdgeDiT achieves state-of-the-art performance, with a test NLL of -137.97 at standard scale and -141.85 at double scale, alongside atom stability of 98.98%, and molecular stability of 90.51%. These results surpass all equivariant baselines while maintaining high validity and uniqueness and nearly 2x faster sampling compared to EDM. Altogether, our study establishes frame-based diffusion as a scalable, flexible, and physically grounded paradigm for molecular generation, highlighting the critical role of global structure preservation.

Frame-based Equivariant Diffusion Models for 3D Molecular Generation

TL;DR

Frame-based diffusion addresses the challenge of enforcing symmetry in molecular generation by decoupling symmetry handling from the backbone. It introduces Global Frame Diffusion, Local Frame Diffusion, and Invariant Frame Diffusion, combined with EdgeDiT backbones, achieving deterministic equivariance while preserving global geometry. On QM9, GFD with EdgeDiT achieves state-of-the-art negative log-likelihood and high atom/molecular stability, with nearly 2x faster sampling than EDM; LFD benefits from a frame-alignment constraint, while IFD sacrifices some diversity but gains efficiency. The work establishes frame-based diffusion as scalable and physically grounded, highlighting the importance of global structure preservation for effective molecular learning.

Abstract

Recent methods for molecular generation face a trade-off: they either enforce strict equivariance with costly architectures or relax it to gain scalability and flexibility. We propose a frame-based diffusion paradigm that achieves deterministic E(3)-equivariance while decoupling symmetry handling from the backbone. Building on this paradigm, we investigate three variants: Global Frame Diffusion (GFD), which assigns a shared molecular frame; Local Frame Diffusion (LFD), which constructs node-specific frames and benefits from additional alignment constraints; and Invariant Frame Diffusion (IFD), which relies on pre-canonicalized invariant representations. To enhance expressivity, we further utilize EdgeDiT, a Diffusion Transformer with edge-aware attention. On the QM9 dataset, GFD with EdgeDiT achieves state-of-the-art performance, with a test NLL of -137.97 at standard scale and -141.85 at double scale, alongside atom stability of 98.98%, and molecular stability of 90.51%. These results surpass all equivariant baselines while maintaining high validity and uniqueness and nearly 2x faster sampling compared to EDM. Altogether, our study establishes frame-based diffusion as a scalable, flexible, and physically grounded paradigm for molecular generation, highlighting the critical role of global structure preservation.

Paper Structure

This paper contains 11 sections, 6 equations, 5 figures, 2 tables, 7 algorithms.

Figures (5)

  • Figure 1: Local Frame-based Diffusion Model (LFD). The framework operates through the following process: (1) The input molecule is noised and then is processed by an equivariant model to construct local frames for each atom. (2) These local frames are used to derive invariant representations that capture local molecular geometry while being invariant to $\mathbb{E}(3)$ group. (3) The backbone diffusion model takes the invariant representations as inputs to predict the invariant denoised molecule. (4) Predicted molecule is obtained by applying the local frames inversely. Loss is computed between input molecule and predicted molecule.
  • Figure 2: Global Frame-based Diffusion (GFD): an equivariant module constructs a global molecular frame from noised inputs, invariant features are derived and denoised by the backbone, and the final molecule is reconstructed via inverse frame transformation with loss to the original.
  • Figure 3: Invariant Frame-based Diffusion Model (IFD). The framework operates in two stages: (1) Pre-canonicalization: An equivariant model processes the input molecule to construct a global frame and derives rotation-invariant representations. (2) Diffusion process: The backbone model takes the noised invariant representation as inputs to predict the denoised molecule structure. The predicted output is the invariant denoised molecule, which ideally should be the rotated input molecule. Loss is computed between invariant representation and predicted invariant denoised molecule..
  • Figure 4: Training curves on QM9 comparing EDM, GFD + EdgeDiT, LFD + EdgeDiT, Aligned LFD + EdgeDiT, and SymDiff. Aligned LFD refers to LFD augmented with the proposed frame alignment loss. GFD and Aligned LFD achieve the best overall results, converging faster and attaining higher stability and lower validation loss compared to baselines. These results demonstrate that vanilla LFD substantially underperforms GFD, but incorporating frame alignment recovers performance to the level of GFD, thereby validating our hypothesis that preserving global Euclidean structure is essential for effective molecular generation.
  • Figure 5: Comparison of molecular generation performance across EDM, GFD with EdgeDiT, and IFD with EdgeDiT.