Table of Contents
Fetching ...

Text-Guided Molecule Generation with Diffusion Language Model

Haisong Gong, Qiang Liu, Shu Wu, Liang Wang

TL;DR

Autoregressive SMILES generation often struggles to enforce global constraints encoded in text descriptions. The authors propose TGM-DLM, a diffusion language model that updates SMILES embeddings through a two-phase process: a text-guided phase that generates embeddings from noise and a correction phase that fixes invalid strings without guidance. Two training objectives—denoising with text guidance and corrective training with corrupted inputs—drive the model. On ChEBI-20, TGM-DLM outperforms autoregressive baselines across multiple metrics and achieves higher validity after correction, without requiring extra data, highlighting its potential for targeted, efficient drug-design workflows.

Abstract

Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.

Text-Guided Molecule Generation with Diffusion Language Model

TL;DR

Autoregressive SMILES generation often struggles to enforce global constraints encoded in text descriptions. The authors propose TGM-DLM, a diffusion language model that updates SMILES embeddings through a two-phase process: a text-guided phase that generates embeddings from noise and a correction phase that fixes invalid strings without guidance. Two training objectives—denoising with text guidance and corrective training with corrupted inputs—drive the model. On ChEBI-20, TGM-DLM outperforms autoregressive baselines across multiple metrics and achieves higher validity after correction, without requiring extra data, highlighting its potential for targeted, efficient drug-design workflows.

Abstract

Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.
Paper Structure (27 sections, 9 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 9 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) Depiction of a molecule along with its corresponding SMILES representation. The main chain and side chains are colored purple and blue, respectively, both in the molecule graph and SMILES string. Ring numbering is highlighted in red. (b) The fundamental framework of the diffusion model for language generation. SMILES is treated as a sequence of language tokens. Through embedding and forward processes, the sequence transforms into pure noise. The reverse and rounding processes reconstruct the SMILES string from pure noise.
  • Figure 2: (a) Illustration of TGM-DLM's two-phase diffusion process. Phase one starts from pure noise, denoising $\mathbf{x}_t$ to $\mathbf{x}_B$ under text guidance. Phase two, without guidance, corrects phase one's outputs that can't be rounded to valid SMILES strings. (b) Two training objectives designed for TGM-DLM. The first objective entails denoising under text guidance, ensuring alignment with text descriptions. The second objective aims to enhance the model's ability to rectify invalid content, achieved by training it to recover embeddings from intentionally corrupted versions.
  • Figure 3: Example of molecules generated by different models with the same input descriptions. Generated SMILES strings are converted to molecule graphs for better visualization.