Table of Contents
Fetching ...

MUDiff: Unified Diffusion for Complete Molecule Generation

Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup

TL;DR

MUDiff addresses the challenge of generating a complete molecular representation by jointly modeling 2D graph structure and 3D coordinates through a diffusion process. It couples a continuous-discrete diffusion scheme with MUformer, an equivariant transformer that denoises atom features, edge types, and coordinates while preserving roto-translation symmetry. The approach yields more stable and unique molecules, remains effective with limited 3D data, and supports conditional generation and property prediction, demonstrating strong potential for drug discovery and material design. Overall, the work advances unified diffusion-based generation for molecules by integrating complete topological and geometric information via a dual-channel Transformer architecture.

Abstract

Molecule generation is a very important practical problem, with uses in drug discovery and material design, and AI methods promise to provide useful solutions. However, existing methods for molecule generation focus either on 2D graph structure or on 3D geometric structure, which is not sufficient to represent a complete molecule as 2D graph captures mainly topology while 3D geometry captures mainly spatial atom arrangements. Combining these representations is essential to better represent a molecule. In this paper, we present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates, by combining discrete and continuous diffusion processes. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and exploring the effect of different factors on molecular structures. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer adheres to 3D roto-translation equivariance constraints, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules. Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.

MUDiff: Unified Diffusion for Complete Molecule Generation

TL;DR

MUDiff addresses the challenge of generating a complete molecular representation by jointly modeling 2D graph structure and 3D coordinates through a diffusion process. It couples a continuous-discrete diffusion scheme with MUformer, an equivariant transformer that denoises atom features, edge types, and coordinates while preserving roto-translation symmetry. The approach yields more stable and unique molecules, remains effective with limited 3D data, and supports conditional generation and property prediction, demonstrating strong potential for drug discovery and material design. Overall, the work advances unified diffusion-based generation for molecules by integrating complete topological and geometric information via a dual-channel Transformer architecture.

Abstract

Molecule generation is a very important practical problem, with uses in drug discovery and material design, and AI methods promise to provide useful solutions. However, existing methods for molecule generation focus either on 2D graph structure or on 3D geometric structure, which is not sufficient to represent a complete molecule as 2D graph captures mainly topology while 3D geometry captures mainly spatial atom arrangements. Combining these representations is essential to better represent a molecule. In this paper, we present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates, by combining discrete and continuous diffusion processes. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and exploring the effect of different factors on molecular structures. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer adheres to 3D roto-translation equivariance constraints, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules. Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
Paper Structure (60 sections, 57 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 60 sections, 57 equations, 3 figures, 7 tables, 2 algorithms.

Figures (3)

  • Figure 1: The figure showcases our MUformer for processing 2D and 3D molecular data. Within the Transformer backbone, two channels exist: purple for 2D data and brown for 3D data. The blue part encodes 2D molecular structures, while the green part handles atom-level information and the red part processes 3D geometric structures. With missing 2D or 3D structures, the model activates either the invariant (purple) or equivariant (brown) channel. The invariant channel predicts atom and edge features, while the equivariant channel offers geometric transformation robustness and predicts atom features and positions. When both channels are operational, the model maintains robustness to geometric transformations and predicts a complete molecule, and final atom features are derived by merging outputs from both channels and feeding the combined data through an output network.
  • Figure : Training MUDiff
  • Figure : Sampling from MUDiff