Table of Contents
Fetching ...

Graph Diffusion that can Insert and Delete

Matteo Ninniri, Marco Podda, Davide Bacciu

TL;DR

GrIDDD introduces a size-adaptive discrete graph diffusion model that can monotically insert or delete nodes during generation, addressing the fixed graph-size limitation of prior graph DDPMs. It achieves this by modeling forward deletions with DEL/DEL* states and forward insertions with marginal-label sampling, plus a logistic-based timing schedule for insertion/deletion steps and an auxiliary predictor for reinsertions. During training, GrIDDD learns to predict node/edge types, activation times, and the number of DEL* reinforcements, while sampling alternates insertions and deletions to produce a valid target graph G^0 from a latent G^T. Empirically, GrIDDD matches or surpasses state-of-the-art in property targeting on QM9 and ZINC-250k and demonstrates competitive performance in property optimization, particularly when MW is a target, underscoring the practical impact of size-adaptive graph diffusion for molecular design.

Abstract

Generative models of graphs based on discrete Denoising Diffusion Probabilistic Models (DDPMs) offer a principled approach to molecular generation by systematically removing structural noise through iterative atom and bond adjustments. However, existing formulations are fundamentally limited by their inability to adapt the graph size (that is, the number of atoms) during the diffusion process, severely restricting their effectiveness in conditional generation scenarios such as property-driven molecular design, where the targeted property often correlates with the molecular size. In this paper, we reformulate the noising and denoising processes to support monotonic insertion and deletion of nodes. The resulting model, which we call GrIDDD, dynamically grows or shrinks the chemical graph during generation. GrIDDD matches or exceeds the performance of existing graph diffusion models on molecular property targeting despite being trained on a more difficult problem. Furthermore, when applied to molecular optimization, GrIDDD exhibits competitive performance compared to specialized optimization models. This work paves the way for size-adaptive molecular generation with graph diffusion.

Graph Diffusion that can Insert and Delete

TL;DR

GrIDDD introduces a size-adaptive discrete graph diffusion model that can monotically insert or delete nodes during generation, addressing the fixed graph-size limitation of prior graph DDPMs. It achieves this by modeling forward deletions with DEL/DEL* states and forward insertions with marginal-label sampling, plus a logistic-based timing schedule for insertion/deletion steps and an auxiliary predictor for reinsertions. During training, GrIDDD learns to predict node/edge types, activation times, and the number of DEL* reinforcements, while sampling alternates insertions and deletions to produce a valid target graph G^0 from a latent G^T. Empirically, GrIDDD matches or surpasses state-of-the-art in property targeting on QM9 and ZINC-250k and demonstrates competitive performance in property optimization, particularly when MW is a target, underscoring the practical impact of size-adaptive graph diffusion for molecular design.

Abstract

Generative models of graphs based on discrete Denoising Diffusion Probabilistic Models (DDPMs) offer a principled approach to molecular generation by systematically removing structural noise through iterative atom and bond adjustments. However, existing formulations are fundamentally limited by their inability to adapt the graph size (that is, the number of atoms) during the diffusion process, severely restricting their effectiveness in conditional generation scenarios such as property-driven molecular design, where the targeted property often correlates with the molecular size. In this paper, we reformulate the noising and denoising processes to support monotonic insertion and deletion of nodes. The resulting model, which we call GrIDDD, dynamically grows or shrinks the chemical graph during generation. GrIDDD matches or exceeds the performance of existing graph diffusion models on molecular property targeting despite being trained on a more difficult problem. Furthermore, when applied to molecular optimization, GrIDDD exhibits competitive performance compared to specialized optimization models. This work paves the way for size-adaptive molecular generation with graph diffusion.

Paper Structure

This paper contains 52 sections, 16 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: Two qualitative examples of the proposed model (GrIDDD) when generating molecules from the QM9 dataset. In the top row, from left to right, we show a subset of the denoising process to generate a molecule starting from a latent graph with two atoms (which are extremely rare in QM9). GrIDDD successfully inserts six more nodes to obtain a sample resembling the training set's distribution. In the bottom row, we start from a latent with 18 atoms instead (not present in QM9). GrIDDD manages to delete nine atoms and obtain a valid molecule. Notice that, unlike current DDPMs for graphs, the graph size is changed dynamically during denoising.
  • Figure 2: The matrices employed in the computation of $\bm{Q}^{*t}$ and $\overline{\bm{Q}}^{*t|s}$. For spacing reasons, the states DEL and DEL$^{*}$ have been shortened as D and D$^{*}$.
  • Figure 3: Validity results of out-of-distribution sampling on QM9.
  • Figure 4: $\zeta(t)$
  • Figure 5: $\zeta^{'}(t) = \frac{\delta\zeta(t)}{\delta t}$
  • ...and 3 more figures