Table of Contents
Fetching ...

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

Zhiyuan Liu, Yanchen Luo, Han Huang, Enzhi Zhang, Sihang Li, Junfeng Fang, Yaorui Shi, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

TL;DR

NExT-Mol addresses the challenge of robust 3D molecule generation by fusing a large-scale 1D SELFIES language model (MoLlama) with a 3D diffusion model (DMT) and by bridging 1D and 3D representations through transfer learning. The approach leverages MoLlama's extensive 1D/2D molecular knowledge and preserves full 2D graph information in DMT, enabling high validity and accurate conformer prediction. The authors report leading performance in de novo 3D generation, conditional 3D generation, and 3D conformer prediction across GEOM-DRUGS, GEOM-QM9, and QM9-2014, including a 26% relative improvement in 3D Fréchet ChemNet Distance on GEOM-DRUGS and a 13% average gain for conditional generation. The work demonstrates the practicality of a foundation-model-style, two-stage 1D→3D framework for scalable, high-quality molecular design and suggests promising future directions for multi-input and structure-based molecule modeling.

Abstract

3D molecule generation is crucial for drug discovery and material design. While prior efforts focus on 3D diffusion models for their benefits in modeling continuous 3D conformers, they overlook the advantages of 1D SELFIES-based Language Models (LMs), which can generate 100% valid molecules and leverage the billion-scale 1D molecule datasets. To combine these advantages for 3D molecule generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation. NExT-Mol uses an extensively pretrained molecule LM for 1D molecule generation, and subsequently predicts the generated molecule's 3D conformers with a 3D diffusion model. We enhance NExT-Mol's performance by scaling up the LM's model size, refining the diffusion neural architecture, and applying 1D to 3D transfer learning. Notably, our 1D molecule LM significantly outperforms baselines in distributional similarity while ensuring validity, and our 3D diffusion model achieves leading performances in conformer prediction. Given these improvements in 1D and 3D modeling, NExT-Mol achieves a 26% relative improvement in 3D FCD for de novo 3D generation on GEOM-DRUGS, and a 13% average relative gain for conditional 3D generation on QM9-2014. Our codes and pretrained checkpoints are available at https://github.com/acharkq/NExT-Mol.

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

TL;DR

NExT-Mol addresses the challenge of robust 3D molecule generation by fusing a large-scale 1D SELFIES language model (MoLlama) with a 3D diffusion model (DMT) and by bridging 1D and 3D representations through transfer learning. The approach leverages MoLlama's extensive 1D/2D molecular knowledge and preserves full 2D graph information in DMT, enabling high validity and accurate conformer prediction. The authors report leading performance in de novo 3D generation, conditional 3D generation, and 3D conformer prediction across GEOM-DRUGS, GEOM-QM9, and QM9-2014, including a 26% relative improvement in 3D Fréchet ChemNet Distance on GEOM-DRUGS and a 13% average gain for conditional generation. The work demonstrates the practicality of a foundation-model-style, two-stage 1D→3D framework for scalable, high-quality molecular design and suggests promising future directions for multi-input and structure-based molecule modeling.

Abstract

3D molecule generation is crucial for drug discovery and material design. While prior efforts focus on 3D diffusion models for their benefits in modeling continuous 3D conformers, they overlook the advantages of 1D SELFIES-based Language Models (LMs), which can generate 100% valid molecules and leverage the billion-scale 1D molecule datasets. To combine these advantages for 3D molecule generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation. NExT-Mol uses an extensively pretrained molecule LM for 1D molecule generation, and subsequently predicts the generated molecule's 3D conformers with a 3D diffusion model. We enhance NExT-Mol's performance by scaling up the LM's model size, refining the diffusion neural architecture, and applying 1D to 3D transfer learning. Notably, our 1D molecule LM significantly outperforms baselines in distributional similarity while ensuring validity, and our 3D diffusion model achieves leading performances in conformer prediction. Given these improvements in 1D and 3D modeling, NExT-Mol achieves a 26% relative improvement in 3D FCD for de novo 3D generation on GEOM-DRUGS, and a 13% average relative gain for conditional 3D generation on QM9-2014. Our codes and pretrained checkpoints are available at https://github.com/acharkq/NExT-Mol.

Paper Structure

This paper contains 34 sections, 5 equations, 10 figures, 18 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overview of our NExT-Mol foundation model for 3D molecule generation. NExT-Mol consists of three key components: (1) MoLlama, a large LM for generating 1D molecule sequences; (2) DMT, a diffusion model to predict 3D conformers from the 1D sequences; and (3) NExT-Mol leverages transfer learning to enhance DMT's 3D prediction with MoLlama's 1D representations.
  • Figure 2: Overview of DMT's neural architecture. (a) DMT is a diffusion model learning to denoise random Gaussian perturbations $\boldsymbol{\epsilon}$ applied on the 3D coordinates of atoms. (b) DMT relies on the RMHA module to iteratively update atom representations $\mathbf{H}$ and pair representations $\mathbf{E}$.
  • Figure 3: Transfer learning between MoLlama's 1D representations and DMT's 3D prediction. (a) A cross-modal projector bridges the gap between MoLlama and DMT. Grey H atoms have no corresponding SELFIES tokens, and are replaced by a learnable token. (b) Transfer learning's three training stages. Snowflake denotes frozen parameters while flame denotes trainable ones.
  • Figure 4: Visualization of 3D conformers. We select the predicted conformers with the least RMSD to the ground truth (GT).
  • Figure 5: Effect of sampling steps on AMR$\downarrow$ for 3D conformer prediction using DMT-B.
  • ...and 5 more figures