Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning
Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu
TL;DR
This work addresses the challenge of predicting equilibrium distributions of molecular systems rather than a single static structure. It introduces Distributional Graphormer (DiG), a diffusion-based framework with a Graphormer backbone that learns reverse diffusion conditioned on molecular descriptors to generate diverse, thermodynamically plausible conformations and estimate state densities. DiG can be trained with data (MD/experimental) or physics-informed diffusion pre-training using energy functions, enabling step-by-step supervision and density computation. The authors demonstrate DiG on protein conformations, ligand poses, catalyst-adsorbate distributions, and carbon polymorph design, achieving MD-like coverage with substantial speedups and enabling inverse design via property conditioning. These results offer a scalable path to macroscopic thermodynamic insights and design capabilities across chemistry and materials science.
Abstract
Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computationally expensive and often intractable. In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. Inspired by the annealing process in thermodynamics, DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system, such as a chemical graph or a protein sequence. This framework enables efficient generation of diverse conformations and provides estimations of state densities. We demonstrate the performance of DiG on several molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst-adsorbate sampling, and property-guided structure generation. DiG presents a significant advancement in methodology for statistically understanding molecular systems, opening up new research opportunities in molecular science.
