Table of Contents
Fetching ...

VEDA: 3D Molecular Generation via Variance-Exploding Diffusion with Annealing

Peining Zhang, Jinbo Bi, Minghu Song

TL;DR

VEDA introduces a principled framework for 3D molecular generation by unifying variance-exploding diffusion with annealing in an SE(3)-equivariant setting. It provides a theoretically grounded preconditioning scheme for coordinate prediction and an arcsin-based scheduler to balance exploration and refinement, enabling fast sampling with high chemical validity. Across QM9 and GEOM-DRUGS, VEDA delivers state-of-the-art stability and accuracy while matching flow-based efficiency at around 100 steps, and it achieves substantially lower relaxation energy than baselines. The work demonstrates a clear path toward rapid, accurate, native-3D molecular generation and highlights directions for future improvements, including explicit velocity-field outputs and property-conditioned generation.

Abstract

Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they often produce geometrically inaccurate structures, as they have difficulty capturing the multimodal distributions of molecular conformations. In contrast, denoising diffusion models are more accurate but suffer from slow sampling, a limitation attributed to sub-optimal integration between diffusion dynamics and SE(3)-equivariant architectures. To address this, we propose VEDA, a unified SE(3)-equivariant framework that combines variance-exploding diffusion with annealing to efficiently generate conformationally accurate 3D molecular structures. Specifically, our key technical contributions include: (1) a VE schedule that enables noise injection functionally analogous to simulated annealing, improving 3D accuracy and reducing relaxation energy; (2) a novel preconditioning scheme that reconciles the coordinate-predicting nature of SE(3)-equivariant networks with a residual-based diffusion objective, and (3) a new arcsin-based scheduler that concentrates sampling in critical intervals of the logarithmic signal-to-noise ratio. On the QM9 and GEOM-DRUGS datasets, VEDA matches the sampling efficiency of flow-based models, achieving state-of-the-art valency stability and validity with only 100 sampling steps. More importantly, VEDA's generated structures are remarkably stable, as measured by their relaxation energy during GFN2-xTB optimization. The median energy change is only 1.72 kcal/mol, significantly lower than the 32.3 kcal/mol from its architectural baseline, SemlaFlow. Our framework demonstrates that principled integration of VE diffusion with SE(3)-equivariant architectures can achieve both high chemical accuracy and computational efficiency.

VEDA: 3D Molecular Generation via Variance-Exploding Diffusion with Annealing

TL;DR

VEDA introduces a principled framework for 3D molecular generation by unifying variance-exploding diffusion with annealing in an SE(3)-equivariant setting. It provides a theoretically grounded preconditioning scheme for coordinate prediction and an arcsin-based scheduler to balance exploration and refinement, enabling fast sampling with high chemical validity. Across QM9 and GEOM-DRUGS, VEDA delivers state-of-the-art stability and accuracy while matching flow-based efficiency at around 100 steps, and it achieves substantially lower relaxation energy than baselines. The work demonstrates a clear path toward rapid, accurate, native-3D molecular generation and highlights directions for future improvements, including explicit velocity-field outputs and property-conditioned generation.

Abstract

Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they often produce geometrically inaccurate structures, as they have difficulty capturing the multimodal distributions of molecular conformations. In contrast, denoising diffusion models are more accurate but suffer from slow sampling, a limitation attributed to sub-optimal integration between diffusion dynamics and SE(3)-equivariant architectures. To address this, we propose VEDA, a unified SE(3)-equivariant framework that combines variance-exploding diffusion with annealing to efficiently generate conformationally accurate 3D molecular structures. Specifically, our key technical contributions include: (1) a VE schedule that enables noise injection functionally analogous to simulated annealing, improving 3D accuracy and reducing relaxation energy; (2) a novel preconditioning scheme that reconciles the coordinate-predicting nature of SE(3)-equivariant networks with a residual-based diffusion objective, and (3) a new arcsin-based scheduler that concentrates sampling in critical intervals of the logarithmic signal-to-noise ratio. On the QM9 and GEOM-DRUGS datasets, VEDA matches the sampling efficiency of flow-based models, achieving state-of-the-art valency stability and validity with only 100 sampling steps. More importantly, VEDA's generated structures are remarkably stable, as measured by their relaxation energy during GFN2-xTB optimization. The median energy change is only 1.72 kcal/mol, significantly lower than the 32.3 kcal/mol from its architectural baseline, SemlaFlow. Our framework demonstrates that principled integration of VE diffusion with SE(3)-equivariant architectures can achieve both high chemical accuracy and computational efficiency.

Paper Structure

This paper contains 60 sections, 41 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: An overview of the VEDA framework, detailing its training and sampling processes. During Training (top), a clean molecule $\mathcal{M}_0$ is perturbed via Gaussian noise for coordinates ($\mathbf{x}_t$) and categorical corruption for features ($\mathbf{z}_t$), defined by Eq. \ref{['eq:coodinate_definition']} and Eq. \ref{['eq:type_definition']}. The equivariant network $f_\theta$ is then trained to predict the original molecule by minimizing the combined Mean Squared Error (MSE) and Cross-Entropy (CE) loss in Eq. \ref{['eq:Loss_veda']}. During Sampling (bottom), the process starts from a pure noise distribution (a Gaussian point cloud with uniform categorical features) and iteratively refines the sample over $K$ steps. Each integration step $i$ involves: (1) a noise injection from $(\mathbf{x}_i,\mathbf{z}_i)$ to $(\mathbf{\hat{x}}_i, \mathbf{\hat{z}}_i)$; (2) the network $f_\theta$ is applied to both coordinates and features, with preconditioning affecting only the coordinate predictions; it outputs $x_{\text{pred}}$ and the category probabilities $p^{(z)}_\theta$ (Eq. \ref{['eq:updated_precondition']}); and (3) an update combining a continuous Euler step and a Discrete Flow Matching sampler to obtain $\mathcal{M}_{i-1}$, formally given in Eq. \ref{['eq:step_function_coordinated']} and Eq. \ref{['eq:transition_rate']}. In the diagram, blue boxes represent the main network module, gray boxes are auxiliary operations, and arrows indicate the data flow.
  • Figure 2: The arcsin sampling scheduler is proposed to focus on the middle part of sampling where log(SNR) is close to 0
  • Figure 3: Trade-off between Generation Quality and Computational Cost. The figure compares our model, VEDA-S (red), against flow-based (green) and denoising-based (blue) models. The horizontal axis represents the computational cost, measured by the Number of Function Evaluations (NFE). (Left) Quality measured by molecule Validity and Connectivity, where higher values are better. (Right) Quality measured by the median energy difference ($\Delta E_{relax}$), where lower values are better.
  • Figure 4: Ablation study of key hyperparameters on GEOM-DRUGS: We report MMFF94 energy and validity when varying (left) arcsin noise factor $\rho$, (middle) noise injection level $\gamma$, and (right) sampling steps. Black circles indicate selected values ($\rho{=}2.5$, $\gamma{=}0.4$, 100 steps).
  • Figure 5: Quality–efficiency trade‑off for VEDA‑E on QM9 under different sampling strategies, compared with EDM hoogeboom2022equivariant. Even at low NFE, our model sustains high validity and structural stability.
  • ...and 2 more figures