Table of Contents
Fetching ...

Property-Guided Molecular Generation and Optimization via Latent Flows

Alexander Arjun Lobo, Urvi Awasthi, Leonid Zhukov

Abstract

Molecular discovery is increasingly framed as an inverse design problem: identifying molecular structures that satisfy desired property profiles under feasibility constraints. While recent generative models provide continuous latent representations of chemical space, targeted optimization within these representations often leads to degraded validity, loss of structural fidelity, or unstable behavior. We introduce MoltenFlow, a modular framework that combines property-organized latent representations with flow-matching generative priors and gradient-based guidance. This formulation supports both conditioned generation and local optimization within a single latent-space framework. We show that guided latent flows enable efficient multi-objective molecular optimization under fixed oracle budgets with controllable trade-offs, while a learned flow prior improves unconditional generation quality.

Property-Guided Molecular Generation and Optimization via Latent Flows

Abstract

Molecular discovery is increasingly framed as an inverse design problem: identifying molecular structures that satisfy desired property profiles under feasibility constraints. While recent generative models provide continuous latent representations of chemical space, targeted optimization within these representations often leads to degraded validity, loss of structural fidelity, or unstable behavior. We introduce MoltenFlow, a modular framework that combines property-organized latent representations with flow-matching generative priors and gradient-based guidance. This formulation supports both conditioned generation and local optimization within a single latent-space framework. We show that guided latent flows enable efficient multi-objective molecular optimization under fixed oracle budgets with controllable trade-offs, while a learned flow prior improves unconditional generation quality.

Paper Structure

This paper contains 75 sections, 18 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: MoltenFlow overview. A VAE maps discrete molecular inputs $x$ to continuous latent representations $z$ and reconstructs molecules via a decoder. The latent space is organized through auxiliary property prediction: a differentiable property surrogate $f_\psi(z)$ induces an objective $\mathcal{J}(z; c)$ (e.g., target values or directional goals specified by a guidance vector $c$), and provides a guidance signal $g=\nabla_z \mathcal{J}(z_t; c)$ during inference. In parallel, a flow-matching model learns a time-dependent vector field $v_\omega(z,t)$ that transports samples from a simple base distribution (random noise) to the empirical distribution of valid latent representations. The flow-based latent transport visualization is adapted from sabour2025alignflowscalingcontinuoustime, and the property-guidance schematic is inspired by ChemFlow wei2024chemflow.
  • Figure 2: Hypervolume improvement (HVI) as a function of oracle calls under a fixed budget. Curves show mean performance across random seeds, with 90% bootstrap confidence intervals.
  • Figure 3: Densities of Pareto fronts after budgeted optimization. MoltenFlow yields dense, stable fronts, while BO baselines yield sparser, more variable fronts.
  • Figure 4: Qualitative effect of guidance strength on optimization. As the guidance strength $\gamma$ increases, MoltenFlow produces progressively larger structural changes. Small $\gamma$ yields minimal edits dominated by the latent flow prior, intermediate $\gamma$ induces coherent improvements in QED with reasonable SA, and large $\gamma$ leads to aggressive modifications indicative of over-optimization.
  • Figure 5: Effect of guidance strength on optimization and distributional properties. Each panel shows a metric as the guidance strength $\gamma$ increases (log scale). From left to right: hypervolume improvement (HVI), validity, Fréchet distance (FD-FP), and scaffold diversity. HVI increases sharply in an intermediate regime, indicating effective Pareto-front advancement, while excessive guidance leads to rising Fréchet distance and collapsing diversity, signaling over-optimization. Starred points denote the $\gamma$ achieving the best value for each metric.
  • ...and 6 more figures