Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Kaiwei Zhang; Yange Lin; Guangcheng Wu; Yuxiang Ren; Xuecang Zhang; Bo wang; Xiaoyu Zhang; Weitao Du

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

TL;DR

3DToMolo presents a text-driven diffusion framework that jointly models 2D molecular graphs and 3D conformers and aligns them with textual prompts via a contrastive, $SE(3)$-equivariant architecture. It defines text-structural optimization, develops a two-phase pretraining regime, and demonstrates zero-shot, flexible, and hard-constraint molecular optimization tasks, achieving superior hit ratios across diverse physicochemical objectives compared to baselines. The method decouples the structure generator from text-guidance, enabling efficient multi-modality optimization and the incorporation of manifold constraints to preserve validity, while enabling substructure preservation through inpainting-like capabilities. The work highlights the potential of multi-modality diffusion for rapid, property-directed molecular design and outlines future directions toward adversarial alignment, bidirectional text-molecule models, and synthesis-aware generation.

Abstract

The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities including textual description features and graph structural features, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance optimization settings have shown a superior hit optimization performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to discover potential novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities.

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

TL;DR

3DToMolo presents a text-driven diffusion framework that jointly models 2D molecular graphs and 3D conformers and aligns them with textual prompts via a contrastive,

-equivariant architecture. It defines text-structural optimization, develops a two-phase pretraining regime, and demonstrates zero-shot, flexible, and hard-constraint molecular optimization tasks, achieving superior hit ratios across diverse physicochemical objectives compared to baselines. The method decouples the structure generator from text-guidance, enabling efficient multi-modality optimization and the incorporation of manifold constraints to preserve validity, while enabling substructure preservation through inpainting-like capabilities. The work highlights the potential of multi-modality diffusion for rapid, property-directed molecular design and outlines future directions toward adversarial alignment, bidirectional text-molecule models, and synthesis-aware generation.

Abstract

Paper Structure (22 sections, 17 equations, 6 figures, 4 tables)

This paper contains 22 sections, 17 equations, 6 figures, 4 tables.

Introduction
Results
Definition of text - structural optimization
Development of a text-structural diffusion model
Flexible molecule optimization under Physicochemical Property Prompts
Prompt-driven Molecule optimization with Structural constraints
Hard-coded molecule optimization on appointed sites
Discussion
Methods
Datasets
Training details
Structural optimization through diffusion
Denoising diffusion process
Stationary Distribution.
Joint Denoising Process.
...and 7 more sections

Figures (6)

Figure 1: Overview of 3DToMolo. (a). The alignment of textual description and chemical structures of molecules, which is realized through contrastive learning of the two latent representations: molecule structure encoding with its paired text embedding. (b). Conditional diffusion model. In order to maintain molecule optimization in alignment with the prompt, conditional diffusion model incorporates text prompts at each step during the subsequent backward optimization process. (c). The zero-shot prompt-driven molecule optimization task involves modifying the input molecule in response to a given text prompt related to physicochemical properties. 3DToMolo necessitates the overall optimization of both 2D and 3D features of molecules, ensuring a balanced alignment with the input molecule and the text prompt which is achieved by the conditional diffusion model, shown as (b). (d). Molecule optimization under structural constraints. This task further enhances the similarity to the input molecule by retaining essential structural features. (e). Molecule optimization under appointed sites. Given the precise position within the input molecule, 3DToMolo aims to optimize molecule by offering strategies for atoms and the bonds connected with the site.
Figure 2: Exemplary prompt-driven molecule optimizations: (a) the highest occupied molecular orbital (HOMO) energy optimization, (b) the lowest unoccupied molecular orbital (LUMO) energy optimization, (c) the HOMO-LUMO energy gap optimization, (d) the water solubility optimization, (e) the polarity optimization, (f) the water solubility and polarity multi-objective optimization. Prompts shown in the figure are simplified and exact prompts used in experiments can be found in Table \ref{['tab:overall performance']}.
Figure 3: The visualization of binding-affinity-based molecule optimization. The text prompt is from ChEMBL 1613777 ("This molecule is tested positive in an assay that are inhibitors and substrates of an enzyme protein. It uses molecular oxygen inserting one oxygen atom into a substrate, and reducing the second into a water molecule." mendez2019chembl).
Figure 4: Exemplary optimizations involving spatial information in the polarity-related prompts. Above are the 2D graphs of molecules and below are their corresponding 3D conformations. (a) Under the prompt "This molecule has low polarity", a hydroxyl group is added to the molecule, which neutralizes the dipole of the neighboring hydroxyl group due to the opposite alignment, as illustrated by the arrows. Consequently, the dipole moment of the molecule is reduced from 1.898 Debye to 0.914 Debye. (b) Under the prompt "This molecule has high polarity", the output molecule discards two C-F bonds which counteract the dipole of the pyridine ring and hence do not contribute much to the polarity. The removal of C-F bonds and the introduction of an aligned hydroxyl group raise the dipole moment of the molecule from 0.467 Debye to 0.905 Debye.
Figure 5: Molecule optimization with structural constraints. (a) Redox-related prompt-driven molecule optimization with preserved skeletons. (b) A case study for optimization on the internal region. On the left is a reported spiro-linked $\pi-$conjugated molecule, tetraphenylsilane, used as the template. The center silicon atom (yellow-shaded area) is diffused while the four benzene rings remain fixed until final steps of the denoising process. On the right are two selected output structures whose benzene rings remain non-coplanar as desired. Hydrogen atoms are not shown for the sake of clear visualization.
...and 1 more figures

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

TL;DR

Abstract

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (6)