Table of Contents
Fetching ...

DMol: A Highly Efficient and Chemical Motif-Preserving Molecule Generation Platform

Peizhi Niu, Yu-Hsiang Wang, Vishal Rana, Chetan Rupakheti, Abhishek Pandey, Olgica Milenkovic

TL;DR

DMol addresses the challenge of efficient, motif-preserving small-molecule generation by introducing a discrete graph-diffusion framework where node and edge noise are coupled through a deterministic, progressively expanding set of perturbed subgraphs. A motif-compression mechanism maps frequent ring motifs to supernodes, enabling diffusion on a smaller graph and straightforward decoding, while a revised loss couples perturbation counts to the forward process. Empirically, DMol achieves up to an order-of-magnitude reduction in diffusion steps and gains in SMILES validity, ChEMBL likeness, and QED over strong baselines like DiGress and DeFoG across QM9, MOSES, and GUACAMOL, with motif distributions faithfully preserved. The scaffold-informed conditioning and docking analyses illustrate practical utility for drug design, suggesting DMol’s approach can accelerate generation of chemically plausible candidates for synthesis and docking. Overall, DMol offers a principled, scalable framework that simultaneously improves molecular quality and generation efficiency, with clear avenues for conditional generation and broader motif-handling in future work.

Abstract

We introduce a new graph diffusion model for small molecule generation, DMol, which outperforms the state-of-the-art DiGress model in terms of validity by roughly 1.5% across all benchmarking datasets while reducing the number of diffusion steps by at least 10-fold, and the running time to roughly one half. The performance improvements are a result of a careful change in the objective function and a graph noise scheduling approach which, at each diffusion step, allows one to only change a subset of nodes of varying size in the molecule graph. Another relevant property of the method is that it can be easily combined with junction-tree-like graph representations that arise by compressing a collection of relevant ring structures into supernodes. Unlike classical junction-tree techniques that involve VAEs and require complicated reconstruction steps, compressed DMol directly performs graph diffusion on a graph that compresses only a carefully selected set of frequent carbon rings into supernodes, which results in straightforward sample generation. This compressed DMol method offers additional validity improvements over generic DMol of roughly 2%, increases the novelty of the method, and further improves the running time due to reductions in the graph size.

DMol: A Highly Efficient and Chemical Motif-Preserving Molecule Generation Platform

TL;DR

DMol addresses the challenge of efficient, motif-preserving small-molecule generation by introducing a discrete graph-diffusion framework where node and edge noise are coupled through a deterministic, progressively expanding set of perturbed subgraphs. A motif-compression mechanism maps frequent ring motifs to supernodes, enabling diffusion on a smaller graph and straightforward decoding, while a revised loss couples perturbation counts to the forward process. Empirically, DMol achieves up to an order-of-magnitude reduction in diffusion steps and gains in SMILES validity, ChEMBL likeness, and QED over strong baselines like DiGress and DeFoG across QM9, MOSES, and GUACAMOL, with motif distributions faithfully preserved. The scaffold-informed conditioning and docking analyses illustrate practical utility for drug design, suggesting DMol’s approach can accelerate generation of chemically plausible candidates for synthesis and docking. Overall, DMol offers a principled, scalable framework that simultaneously improves molecular quality and generation efficiency, with clear avenues for conditional generation and broader motif-handling in future work.

Abstract

We introduce a new graph diffusion model for small molecule generation, DMol, which outperforms the state-of-the-art DiGress model in terms of validity by roughly 1.5% across all benchmarking datasets while reducing the number of diffusion steps by at least 10-fold, and the running time to roughly one half. The performance improvements are a result of a careful change in the objective function and a graph noise scheduling approach which, at each diffusion step, allows one to only change a subset of nodes of varying size in the molecule graph. Another relevant property of the method is that it can be easily combined with junction-tree-like graph representations that arise by compressing a collection of relevant ring structures into supernodes. Unlike classical junction-tree techniques that involve VAEs and require complicated reconstruction steps, compressed DMol directly performs graph diffusion on a graph that compresses only a carefully selected set of frequent carbon rings into supernodes, which results in straightforward sample generation. This compressed DMol method offers additional validity improvements over generic DMol of roughly 2%, increases the novelty of the method, and further improves the running time due to reductions in the graph size.

Paper Structure

This paper contains 34 sections, 25 equations, 15 figures, 11 tables, 2 algorithms.

Figures (15)

  • Figure 1: (a) The forward process; (b) DMol illustration.
  • Figure 2: Motifs (i.e., substructures occurring with high frequencies or frequencies higher than predicted by random models) that also allow for unique scaffold integration are compressed into supernodes with their own labels. During diffusion, supernodes are either converted into other classes of supernodes or into atomic nodes, and vice versa. During sampling, the supernode is decoded back into its corresponding motif.
  • Figure 3: The $3$ selected motifs for QM9.
  • Figure 4: The $15$ selected motifs for MOSES.
  • Figure 5: The $15$ selected motifs for GUACAMOL.
  • ...and 10 more figures