Table of Contents
Fetching ...

All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram, Zachary W. Ulissi

TL;DR

ADiT presents a unified latent diffusion framework that jointly models molecules and materials via a shared all-atom latent space learned by a Transformer-based VAE, followed by a Diffusion Transformer that denoises latents and decodes them into valid structures. The approach achieves state-of-the-art or competitive results across QM9 molecules and MP20 crystals, with substantial inference speedups over equivariant diffusion models and scalable performance gains as model size increases. Joint training on both domains enables transfer learning and improves sampling validity and stability, while maintaining strong results on larger datasets like GEOM-DRUGS. This work points toward broadly generalizable foundation models for generative chemistry with practical implications for fast, cross-domain inverse design.

Abstract

Diffusion models are the standard toolkit for generative modelling of 3D atomic systems. However, for different types of atomic systems -- such as molecules and materials -- the generative processes are usually highly specific to the target system despite the underlying physics being the same. We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. Experiments on MP20, QM9 and GEOM-DRUGS datasets demonstrate that jointly trained ADiT generates realistic and valid molecules as well as materials, obtaining state-of-the-art results on par with molecule and crystal-specific models. ADiT uses standard Transformers with minimal inductive biases for both the autoencoder and diffusion model, resulting in significant speedups during training and inference compared to equivariant diffusion models. Scaling ADiT up to half a billion parameters predictably improves performance, representing a step towards broadly generalizable foundation models for generative chemistry. Open source code: https://github.com/facebookresearch/all-atom-diffusion-transformer

All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

TL;DR

ADiT presents a unified latent diffusion framework that jointly models molecules and materials via a shared all-atom latent space learned by a Transformer-based VAE, followed by a Diffusion Transformer that denoises latents and decodes them into valid structures. The approach achieves state-of-the-art or competitive results across QM9 molecules and MP20 crystals, with substantial inference speedups over equivariant diffusion models and scalable performance gains as model size increases. Joint training on both domains enables transfer learning and improves sampling validity and stability, while maintaining strong results on larger datasets like GEOM-DRUGS. This work points toward broadly generalizable foundation models for generative chemistry with practical implications for fast, cross-domain inverse design.

Abstract

Diffusion models are the standard toolkit for generative modelling of 3D atomic systems. However, for different types of atomic systems -- such as molecules and materials -- the generative processes are usually highly specific to the target system despite the underlying physics being the same. We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. Experiments on MP20, QM9 and GEOM-DRUGS datasets demonstrate that jointly trained ADiT generates realistic and valid molecules as well as materials, obtaining state-of-the-art results on par with molecule and crystal-specific models. ADiT uses standard Transformers with minimal inductive biases for both the autoencoder and diffusion model, resulting in significant speedups during training and inference compared to equivariant diffusion models. Scaling ADiT up to half a billion parameters predictably improves performance, representing a step towards broadly generalizable foundation models for generative chemistry. Open source code: https://github.com/facebookresearch/all-atom-diffusion-transformer

Paper Structure

This paper contains 12 sections, 11 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Unified generative modelling of molecules and materials with All-atom Diffusion Transformers. ADiT performs generative modelling of chemical systems in two stages: (1) A Variational Autoencoder (VAE) learns a shared latent space by reconstructing all-atom representations of both molecules (non-periodic) and crystals (periodic); and (2) A Diffusion Transformer (DiT) samples new latents from the shared distribution using classifier-free guidance, which are decoded to valid molecules or crystals using the VAE. Our unified latent diffusion framework enables transfer learning and avoids the complexity of multiple diffusion processes on categorical-continuous product manifolds used by equivariant diffusion models.
  • Figure 2: ADiTs are significantly faster than equivariant diffusion models. We plot the number of integration steps for ADiTs and equivariant diffusion models vs. time to generate 10,000 samples on a single V100 GPU. ADiTs scale significantly better with the number of integration steps compared to equivariant diffusion.
  • Figure 3: Scaling up ADiT improves performance. We show the effect of increasing the number of ADiT denoiser parameters on the training loss and generation validity rates. Left: training loss and validity rates vs. epochs. Right: Correlation plots for training loss and validity rates at epoch 2,000 vs. ADiT parameters (in Millions).
  • Figure 4: Histograms from DFT validation of 10,000 generated crystals. ADiT is more likely to generate stable crystals with DFT energy above hull $<$0.0 eV/atom compared to prior models. Samples from ADiT most closely follow the distributions for DFT formation energy and number of unique elements per crystal from MP20.
  • Figure 5: Histogram of spacegroups for 10,000 generated crystals. Diffusion-based ADiT and FlowMM tend to over sample crystals with P1 spacegroup compared to the MP20 training distribution. FlowLLM, an autoregressive language, tends to over sample crystals with Fm-3m, Pm-3m, and I4/mmm spacegroups.
  • ...and 7 more figures