Sample Efficient Generative Molecular Optimization with Joint Self-Improvement

Serra Korkmaz; Adam Izdebski; Jonathan Pirnay; Rasmus Møller-Larsen; Michal Kmicikiewicz; Pankhil Gawade; Dominik G. Grimm; Ewa Szczurek

Sample Efficient Generative Molecular Optimization with Joint Self-Improvement

Serra Korkmaz, Adam Izdebski, Jonathan Pirnay, Rasmus Møller-Larsen, Michal Kmicikiewicz, Pankhil Gawade, Dominik G. Grimm, Ewa Szczurek

TL;DR

The paper tackles the challenge of sample-efficient generative molecular optimization (GMO) under tight evaluation budgets and distribution shift between generator and predictor. It introduces Joint Self-Improvement, a unified framework that (i) trains a joint model with shared parameters for generation and prediction, minimizing a joint log-likelihood loss, and (ii) employs a joint self-improving sampling scheme that biases the generator at inference-time using global advantages derived from the predictive component. The approach leverages a Hyformer backbone and stochastic beam search to produce optimized molecules efficiently, achieving state-of-the-art performance in both offline and online settings across multiple protein targets with restricted oracle access. Ablation studies demonstrate that both joint modeling and self-improving sampling contribute meaningfully, with the method maintaining strong results even at very low molecule-evaluation budgets. Overall, Joint Self-Improvement offers a principled alternative to reinforcement learning-based fine-tuning, reducing variance and distribution-shift issues while delivering practical gains for drug-discovery workflows.

Abstract

Generative molecular optimization aims to design molecules with properties surpassing those of existing compounds. However, such candidates are rare and expensive to evaluate, yielding sample efficiency essential. Additionally, surrogate models introduced to predict molecule evaluations, suffer from distribution shift as optimization drives candidates increasingly out-of-distribution. To address these challenges, we introduce Joint Self-Improvement, which benefits from (i) a joint generative-predictive model and (ii) a self-improving sampling scheme. The former aligns the generator with the surrogate, alleviating distribution shift, while the latter biases the generative part of the joint model using the predictive one to efficiently generate optimized molecules at inference-time. Experiments across offline and online molecular optimization benchmarks demonstrate that Joint Self-Improvement outperforms state-of-the-art methods under limited evaluation budgets.

Sample Efficient Generative Molecular Optimization with Joint Self-Improvement

TL;DR

Abstract

Paper Structure (40 sections, 13 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 40 sections, 13 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Generative Modeling for GMO
Surrogate Modeling for GMO
Background
Molecular Optimization
Reinforcement Learning--based Molecular Optimization
Limitations of Reinforcement Learning Fine-Tuning
Surrogate Modeling
Joint Self-Improvement
Molecular Optimization as Conditional Sampling
Joint Model as a Unified Backbone
Sampling with Joint Self-Improvement
Practical Updates with Global Advantage
Experiments
...and 25 more sections

Figures (5)

Figure 1: RL-based GMO typically starts from a pretrained generative model that unconditionally samples molecules for evaluation (a). However, optimized molecules are rare, resulting in high-variance updates (b). Surrogate models, used as plug-in estimators for molecule evaluation (c), aim to provide additional learning signal (d). However, decoupled generative model updates introduce a distribution shift, as optimized molecules become increasingly out-of-distribution for the surrogate (e).
Figure 2: Offline and online GMO as compared to Joint Self-Improvement framework. (a) In the offline setting, objective $f$ is not available, and a surrogate is used to predict its values. Typically, predictor and generative model training is decoupled. (b) In the online setting, objective $f$ is available and is used for generative model updates. (c) Joint Self-Improvement can perform in both settings: either using the offline dataset to train the joint model and sample new molecules, or updating based on newly generated objective evaluations.
Figure 3: Example logit update (Eq. \ref{['eq:perturbed-model-practical']}) in Joint Self-Improvement with beam width $K{=}2$ and step size $\sigma{=}1$. Next-token probabilities are shown on the arrows. Starting from <BOS>, the model expands a sampling tree using SBS. The expected prediction score $\mu$ is used to compute a per-sampled-sequence global advantage, which is then used to obtain the perturbed distribution $p_{\theta'}(\mathbf{x})$. Panel (a) shows the initial sampling tree, while panel (b) shows the resulting normalized probabilities after removing sampled mass (white) and applying the global advantage.
Figure 4: Binding poses of best molecules generated by Joint Self-Improvement in the offline optimization setting, across all target proteins. Yellow lines indicate hydrogen-bond interactions between the ligands and the target proteins.
Figure 5: True versus predicted docking scores against BRAF protein target of test (blue) and generated (yellow) molecules. We use Joint Self-Improvement for both prediction and generation.

Sample Efficient Generative Molecular Optimization with Joint Self-Improvement

TL;DR

Abstract

Sample Efficient Generative Molecular Optimization with Joint Self-Improvement

Authors

TL;DR

Abstract

Table of Contents

Figures (5)