Table of Contents
Fetching ...

Classifier-free graph diffusion for molecular property targeting

Matteo Ninniri, Marco Podda, Davide Bacciu

TL;DR

Property targeting in molecular graphs is addressed via FreeGress, a discrete diffusion model that uses classifier-free guidance to generate graphs conditioned on target properties without an auxiliary regressor. The forward diffusion operates with $q(G^t|G^{t-1})$ using node/edge transitions, while the reverse process $p_\theta(G^{t-1}|G^t,\mathbf{y})$ blends conditioned and unconditioned predictions through CF inference. Key innovations include a learnable node-count prior $p_\varepsilon(n|\mathbf{y})$, a conditional dropout mechanism, and a graph-transformer architecture enabling effective conditioning on discrete graph structures. Empirically, FreeGress consistently outperforms DiGress on QM9 and ZINC-250k across multiple targets, while using fewer parameters and improving chemical validity; future work includes conditional transition matrices and fragment-based graph representations for broader graph generation tasks.

Abstract

This work focuses on the task of property targeting: that is, generating molecules conditioned on target chemical properties to expedite candidate screening for novel drug and materials development. DiGress is a recent diffusion model for molecular graphs whose distinctive feature is allowing property targeting through classifier-based (CB) guidance. While CB guidance may work to generate molecular-like graphs, we hint at the fact that its assumptions apply poorly to the chemical domain. Based on this insight we propose a classifier-free DiGress (FreeGress), which works by directly injecting the conditioning information into the training process. CF guidance is convenient given its less stringent assumptions and since it does not require to train an auxiliary property regressor, thus halving the number of trainable parameters in the model. We empirically show that our model yields up to 79% improvement in Mean Absolute Error with respect to DiGress on property targeting tasks on QM9 and ZINC-250k benchmarks. As an additional contribution, we propose a simple yet powerful approach to improve chemical validity of generated samples, based on the observation that certain chemical properties such as molecular weight correlate with the number of atoms in molecules.

Classifier-free graph diffusion for molecular property targeting

TL;DR

Property targeting in molecular graphs is addressed via FreeGress, a discrete diffusion model that uses classifier-free guidance to generate graphs conditioned on target properties without an auxiliary regressor. The forward diffusion operates with using node/edge transitions, while the reverse process blends conditioned and unconditioned predictions through CF inference. Key innovations include a learnable node-count prior , a conditional dropout mechanism, and a graph-transformer architecture enabling effective conditioning on discrete graph structures. Empirically, FreeGress consistently outperforms DiGress on QM9 and ZINC-250k across multiple targets, while using fewer parameters and improving chemical validity; future work includes conditional transition matrices and fragment-based graph representations for broader graph generation tasks.

Abstract

This work focuses on the task of property targeting: that is, generating molecules conditioned on target chemical properties to expedite candidate screening for novel drug and materials development. DiGress is a recent diffusion model for molecular graphs whose distinctive feature is allowing property targeting through classifier-based (CB) guidance. While CB guidance may work to generate molecular-like graphs, we hint at the fact that its assumptions apply poorly to the chemical domain. Based on this insight we propose a classifier-free DiGress (FreeGress), which works by directly injecting the conditioning information into the training process. CF guidance is convenient given its less stringent assumptions and since it does not require to train an auxiliary property regressor, thus halving the number of trainable parameters in the model. We empirically show that our model yields up to 79% improvement in Mean Absolute Error with respect to DiGress on property targeting tasks on QM9 and ZINC-250k benchmarks. As an additional contribution, we propose a simple yet powerful approach to improve chemical validity of generated samples, based on the observation that certain chemical properties such as molecular weight correlate with the number of atoms in molecules.
Paper Structure (31 sections, 13 equations, 11 figures, 3 tables)

This paper contains 31 sections, 13 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: A depiction of FreeGress. The forward process, which gradually corrupts a molecule into a random graph, goes from left to the right. The reverse process, which denoises the original graph, goes from right to left. Note that the reverse process allows for a conditioning vector $\bm{y}$ and a number of nodes $n$ sampled from a trained neural network $p_{\varepsilon}$.
  • Figure 2: (a) The overall architecture of FreeGress. Notice that the guide $\bm{y}$ is an input to the model. (b) The graph transformer layer. $\bm{V}$ represents the triple $(\bm{X}, \bm{\textsf{E}}, \bm{u})$. (c) The self attention layer within the graph transformer layer. Notice that we superimposed $^\prime$ to the vectors indicate that they are network/layer outputs.
  • Figure : Input $\mu$: 0.0603 Est. $\mu$: 0.0463
  • Figure : Input $\mu$: 0.0603 Est. $\mu$: 0.0463
  • Figure : Input $\mu$: 4.2338 Est. $\mu$: 4.1238
  • ...and 6 more figures