Table of Contents
Fetching ...

Training-Free Guidance for Discrete Diffusion Models for Molecular Generation

Thomas J. Kerby, Kevin R. Moon

TL;DR

The paper addresses the challenge of conditioning discrete diffusion models for molecular graph generation without retraining. It proposes a training-free guidance framework for discrete diffusion by leveraging a learned reverse distribution $p_\theta(x_0|x_t)$ to compute guided updates, adapting the multinomial forward process used in DiGress. Key contributions include a concrete methodology for applying training-free guidance to discrete data and empirical demonstrations guiding node-type composition and heavy-atom molecular weight, achieving high target fidelity while maintaining molecule validity. This work enables plug-and-play conditioning of discrete foundation diffusion models and suggests broader applicability to other discrete-generation tasks, including discrete text generation.

Abstract

Training-free guidance methods for continuous data have seen an explosion of interest due to the fact that they enable foundation diffusion models to be paired with interchangable guidance models. Currently, equivalent guidance methods for discrete diffusion models are unknown. We present a framework for applying training-free guidance to discrete data and demonstrate its utility on molecular graph generation tasks using the discrete diffusion model architecture of DiGress. We pair this model with guidance functions that return the proportion of heavy atoms that are a specific atom type and the molecular weight of the heavy atoms and demonstrate our method's ability to guide the data generation.

Training-Free Guidance for Discrete Diffusion Models for Molecular Generation

TL;DR

The paper addresses the challenge of conditioning discrete diffusion models for molecular graph generation without retraining. It proposes a training-free guidance framework for discrete diffusion by leveraging a learned reverse distribution to compute guided updates, adapting the multinomial forward process used in DiGress. Key contributions include a concrete methodology for applying training-free guidance to discrete data and empirical demonstrations guiding node-type composition and heavy-atom molecular weight, achieving high target fidelity while maintaining molecule validity. This work enables plug-and-play conditioning of discrete foundation diffusion models and suggests broader applicability to other discrete-generation tasks, including discrete text generation.

Abstract

Training-free guidance methods for continuous data have seen an explosion of interest due to the fact that they enable foundation diffusion models to be paired with interchangable guidance models. Currently, equivalent guidance methods for discrete diffusion models are unknown. We present a framework for applying training-free guidance to discrete data and demonstrate its utility on molecular graph generation tasks using the discrete diffusion model architecture of DiGress. We pair this model with guidance functions that return the proportion of heavy atoms that are a specific atom type and the molecular weight of the heavy atoms and demonstrate our method's ability to guide the data generation.
Paper Structure (9 sections, 14 equations, 2 figures, 2 tables)

This paper contains 9 sections, 14 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Examples of generated molecules using attribute guidance. The top row shows 5 uncurated samples from the 99 valid molecular graphs generated where the target proportion of heavy atoms that are carbon is $0.0$ and $\lambda=100,000$. The bottom row shows 5 uncurated samples from the 1,022 valid molecular graphs generated where the target proportion of heavy atoms that are carbon is $1.0$ and $\lambda=100,000$. At this high value of $\lambda$, the generated molecules match the target proportions exactly. However, for a target proportion of $0.0$, the validity of the generated molecules decreases as $\lambda$ increases, since pushing the carbon proportion to this extreme drives the molecules off the data manifold.
  • Figure 2: Examples of generated molecules using molecular weight guidance. The top row shows 5 uncurated samples from the 1,018 valid molecular graphs generated when the target weight of the heavy atoms is $105$ and $\lambda = 0.2$. The bottom row shows 5 uncurated samples from the 856 valid molecular graphs generated when the target weight of the heavy atoms is $135$ and $\lambda=0.2$.