Classifier-free graph diffusion for molecular property targeting
Matteo Ninniri, Marco Podda, Davide Bacciu
TL;DR
Property targeting in molecular graphs is addressed via FreeGress, a discrete diffusion model that uses classifier-free guidance to generate graphs conditioned on target properties without an auxiliary regressor. The forward diffusion operates with $q(G^t|G^{t-1})$ using node/edge transitions, while the reverse process $p_\theta(G^{t-1}|G^t,\mathbf{y})$ blends conditioned and unconditioned predictions through CF inference. Key innovations include a learnable node-count prior $p_\varepsilon(n|\mathbf{y})$, a conditional dropout mechanism, and a graph-transformer architecture enabling effective conditioning on discrete graph structures. Empirically, FreeGress consistently outperforms DiGress on QM9 and ZINC-250k across multiple targets, while using fewer parameters and improving chemical validity; future work includes conditional transition matrices and fragment-based graph representations for broader graph generation tasks.
Abstract
This work focuses on the task of property targeting: that is, generating molecules conditioned on target chemical properties to expedite candidate screening for novel drug and materials development. DiGress is a recent diffusion model for molecular graphs whose distinctive feature is allowing property targeting through classifier-based (CB) guidance. While CB guidance may work to generate molecular-like graphs, we hint at the fact that its assumptions apply poorly to the chemical domain. Based on this insight we propose a classifier-free DiGress (FreeGress), which works by directly injecting the conditioning information into the training process. CF guidance is convenient given its less stringent assumptions and since it does not require to train an auxiliary property regressor, thus halving the number of trainable parameters in the model. We empirically show that our model yields up to 79% improvement in Mean Absolute Error with respect to DiGress on property targeting tasks on QM9 and ZINC-250k benchmarks. As an additional contribution, we propose a simple yet powerful approach to improve chemical validity of generated samples, based on the observation that certain chemical properties such as molecular weight correlate with the number of atoms in molecules.
