Table of Contents
Fetching ...

Chemistry-Inspired Diffusion with Non-Differentiable Guidance

Yuchen Shen, Chenhao Zhang, Sijie Fu, Chenghui Zhou, Newell Washburn, Barnabás Póczos

TL;DR

ChemGuide introduces a chemistry-driven, non-differentiable oracle to guide diffusion-based molecular generation, addressing the labeled-data bottleneck of traditional conditional diffusion. By estimating gradients with zeroth-order methods (SPSA-like perturbations) and leveraging a latent 3D diffusion model tied to a VAE-EGNN backbone, the approach steers sampling toward geometries with near-zero net forces and ground-state stability. The method demonstrates strong improvements in molecular stability (reduced forces, closer to ground state) and generalizes to other property optimization tasks, especially when combined with neural guidance in a bilevel framework. This work highlights the potential of physics-grounded, gradient-free guidance to augment diffusion-based molecular design, offering practical gains with compatibility to existing conditional diffusion strategies, albeit with computational considerations for physics-based oracle evaluations.

Abstract

Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. These models can be guided in two ways: (i) explicitly, through additional features representing the condition, or (ii) implicitly, using a property predictor. However, training property predictors or conditional diffusion models requires an abundance of labeled data and is inherently challenging in real-world applications. We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry. We show that this results in more precise conditional generation of novel and stable molecular structures. Our experiments demonstrate that our method: (1) significantly reduces atomic forces, enhancing the validity of generated molecules when used for stability optimization; (2) is compatible with both explicit and implicit guidance in diffusion models, enabling joint optimization of molecular properties and stability; and (3) generalizes effectively to molecular optimization tasks beyond stability optimization.

Chemistry-Inspired Diffusion with Non-Differentiable Guidance

TL;DR

ChemGuide introduces a chemistry-driven, non-differentiable oracle to guide diffusion-based molecular generation, addressing the labeled-data bottleneck of traditional conditional diffusion. By estimating gradients with zeroth-order methods (SPSA-like perturbations) and leveraging a latent 3D diffusion model tied to a VAE-EGNN backbone, the approach steers sampling toward geometries with near-zero net forces and ground-state stability. The method demonstrates strong improvements in molecular stability (reduced forces, closer to ground state) and generalizes to other property optimization tasks, especially when combined with neural guidance in a bilevel framework. This work highlights the potential of physics-grounded, gradient-free guidance to augment diffusion-based molecular design, offering practical gains with compatibility to existing conditional diffusion strategies, albeit with computational considerations for physics-based oracle evaluations.

Abstract

Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. These models can be guided in two ways: (i) explicitly, through additional features representing the condition, or (ii) implicitly, using a property predictor. However, training property predictors or conditional diffusion models requires an abundance of labeled data and is inherently challenging in real-world applications. We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry. We show that this results in more precise conditional generation of novel and stable molecular structures. Our experiments demonstrate that our method: (1) significantly reduces atomic forces, enhancing the validity of generated molecules when used for stability optimization; (2) is compatible with both explicit and implicit guidance in diffusion models, enabling joint optimization of molecular properties and stability; and (3) generalizes effectively to molecular optimization tasks beyond stability optimization.
Paper Structure (60 sections, 19 equations, 16 figures, 20 tables, 3 algorithms)

This paper contains 60 sections, 19 equations, 16 figures, 20 tables, 3 algorithms.

Figures (16)

  • Figure 1: The overview of ChemGuide. On the left, we present the space of all molecules (roughly) as a unimodal distribution, where red/blue region indicates molecules for training/novel molecules generated by the diffusion model. In the middle, ChemGuide derives non-differentiable guidance from quantum chemistry to steer the diffusion process towards a conditional distribution (e.g., minimized forces). On the right, we present the average forces of 3 sets of molecules generated by GeoLDM xu2023geometric trained on QM9ramakrishnan2014quantum (left two) and GEOMaxelrod_geom_2022 (rightmost one) without (above) and with (below) ChemGuide.
  • Figure 2: Histograms and distributions of force RMS and energy change of 500 generated molecules from QM9 and GEOM using GeoLDM with ChemGuide under scale=0.0001. Energy change refers to the energy of our generated molecule above its ground state energy.
  • Figure 3: Metrics of 500 generated molecules from QM9 using GeoLDM with ChemGuide using scale=0.0001, calculated at DFT/B3LYP/6-31G(2df,p) level of theory, more precise but computationally expensive than GFN2-xTB. Strict validity is defined as generated geometries within 50 kcal/mol (0.07968 Eh) of the optimized geometries. * and bold denote the overall best result and our best result. Percentage changes between our results and GeoLDM are shown in parentheses.
  • Figure 4: Force RMS and energy change of 200 generated molecules using explicit bilevel optimization, implicit bilevel optimization with noisy neural guidance, C-EDM, and C-GeoLDM. Energy change refers to energy above ground state.
  • Figure 5: MAE trajectories of GeoLDM, C-GeoLDM, and our bilevel optimization on $C_v$.
  • ...and 11 more figures