Table of Contents
Fetching ...

Sampling 3D Molecular Conformers with Diffusion Transformers

J. Thorben Frank, Winfried Ripken, Gregor Lied, Klaus-Robert Müller, Oliver T. Unke, Stefan Chmiela

TL;DR

This work introduces DiTMC, a diffusion-transformer framework for sampling molecular conformers conditioned on molecular graphs. It couples graph-aware conditioning tokens with multiple positional embedding schemes, including an SO(3)-equivariant PE(3), to model a velocity field $v^ heta(oldsymbol{x},t,oldsymbol{G})$ that guides SE(3)-invariant conformer generation. Empirical results on GEOM-QM9, GEOM-DRUGS, and GEOM-XL show state-of-the-art precision and physical validity, with equivariant variants offering higher fidelity at increased compute cost. Ablations reveal the importance of geodesic-based, all-pair conditioning and the impact of symmetry priors on sample quality and efficiency, suggesting scalable, symmetry-aware pathways for large-scale molecular conformer generation. The work highlights promising directions for fast equivariant attention and broader de-novo molecular design within diffusion-based frameworks.

Abstract

Diffusion Transformers (DiTs) have demonstrated strong performance in generative modeling, particularly in image synthesis, making them a compelling choice for molecular conformer generation. However, applying DiTs to molecules introduces novel challenges, such as integrating discrete molecular graph information with continuous 3D geometry, handling Euclidean symmetries, and designing conditioning mechanisms that generalize across molecules of varying sizes and structures. We propose DiTMC, a framework that adapts DiTs to address these challenges through a modular architecture that separates the processing of 3D coordinates from conditioning on atomic connectivity. To this end, we introduce two complementary graph-based conditioning strategies that integrate seamlessly with the DiT architecture. These are combined with different attention mechanisms, including both standard non-equivariant and SO(3)-equivariant formulations, enabling flexible control over the trade-off between between accuracy and computational efficiency. Experiments on standard conformer generation benchmarks (GEOM-QM9, -DRUGS, -XL) demonstrate that DiTMC achieves state-of-the-art precision and physical validity. Our results highlight how architectural choices and symmetry priors affect sample quality and efficiency, suggesting promising directions for large-scale generative modeling of molecular structures. Code is available at https://github.com/ML4MolSim/dit_mc.

Sampling 3D Molecular Conformers with Diffusion Transformers

TL;DR

This work introduces DiTMC, a diffusion-transformer framework for sampling molecular conformers conditioned on molecular graphs. It couples graph-aware conditioning tokens with multiple positional embedding schemes, including an SO(3)-equivariant PE(3), to model a velocity field that guides SE(3)-invariant conformer generation. Empirical results on GEOM-QM9, GEOM-DRUGS, and GEOM-XL show state-of-the-art precision and physical validity, with equivariant variants offering higher fidelity at increased compute cost. Ablations reveal the importance of geodesic-based, all-pair conditioning and the impact of symmetry priors on sample quality and efficiency, suggesting scalable, symmetry-aware pathways for large-scale molecular conformer generation. The work highlights promising directions for fast equivariant attention and broader de-novo molecular design within diffusion-based frameworks.

Abstract

Diffusion Transformers (DiTs) have demonstrated strong performance in generative modeling, particularly in image synthesis, making them a compelling choice for molecular conformer generation. However, applying DiTs to molecules introduces novel challenges, such as integrating discrete molecular graph information with continuous 3D geometry, handling Euclidean symmetries, and designing conditioning mechanisms that generalize across molecules of varying sizes and structures. We propose DiTMC, a framework that adapts DiTs to address these challenges through a modular architecture that separates the processing of 3D coordinates from conditioning on atomic connectivity. To this end, we introduce two complementary graph-based conditioning strategies that integrate seamlessly with the DiT architecture. These are combined with different attention mechanisms, including both standard non-equivariant and SO(3)-equivariant formulations, enabling flexible control over the trade-off between between accuracy and computational efficiency. Experiments on standard conformer generation benchmarks (GEOM-QM9, -DRUGS, -XL) demonstrate that DiTMC achieves state-of-the-art precision and physical validity. Our results highlight how architectural choices and symmetry priors affect sample quality and efficiency, suggesting promising directions for large-scale generative modeling of molecular structures. Code is available at https://github.com/ML4MolSim/dit_mc.

Paper Structure

This paper contains 52 sections, 49 equations, 16 figures, 16 tables, 2 algorithms.

Figures (16)

  • Figure 1: (A) Diffusion transformer for molecular conformer generation (DiTMC), with interchangeable self-attention blocks and positional embeddings (PEs); we evaluate various combinations as detailed in the main text. (B) DiTMC predicts a velocity per atom, used to model a probability flow ODE, which samples from the probability distribution $p(\boldsymbol{x} | \mathcal{G})$, where $\mathcal{G}$ is a molecular graph.
  • Figure 2: Analysis of SO(3)-equivariant (PE(3)) and non-equivariant (aPE, rPE) model formulations on GEOM-QM9. (A) Mean Coverage Recall (COV-R) versus root mean square deviation (RMSD) threshold $\delta$ to any reference conformer. (B) Histogram of the minimal RMSD per generated sample. (C) Loss as a function of latent time $t$ relative to PE(3) loss (see \ref{['app:sec:loss-latent-time']} for details).
  • Figure 3: (A) Coverage (COV) mean and Absolute Minimum RMSD (AMR) mean for DiTMC+aPE models of increasing model capacity on GEOM-DRUGS. (B) Training and inference time for different positional embedding (PE) and associated self-attention strategies. (C) Recall and precision AMR mean for the different DiTMC models on GEOM-DRUGS. For panels (B) and (C) we use results from the base ("B") variant of each model.
  • Figure 4: (A) Coverage (COV) mean as function of Root Mean Square Deviation (RMSD) threshold $\delta$ and (B) average minimum RMSD (AMR) mean vs. time per conformer for DiTMC+aPE and other state-of-the-art models. Per model markers from left to right correspond to 5, 10, 20, and 50 sampler steps following Refs. hassan2024etflowwang2023swallowing. Note, that the original MCF paper reports results with two different samplers. Benchmark results (\ref{['tab:drugs']}) are obtained with DDPM sampler (1000 steps) and AMR vs. time results are reported for DDIM sampler (5--50 steps). (C) Comparison of conformers generated by MCF, ET-Flow, and DiTMC against ground-truth reference conformers from GEOM-XL. Generated conformers are rotationally aligned with their corresponding reference conformer.
  • Figure A5: Loss as a function of time comparing different PE strategies. Results averaged over 1000 samples randomly drawn from the GEOM-QM9 validation set. Left: loss relative to PE(3) as a baseline. In the important regime close to the data distribution, the model PE(3) has lower loss, yielding higher sample fidelity. Right: absolute loss values for all PEs. The loss decreases close to the data distribution for all models.
  • ...and 11 more figures