Sample Efficient Generative Molecular Optimization with Joint Self-Improvement
Serra Korkmaz, Adam Izdebski, Jonathan Pirnay, Rasmus Møller-Larsen, Michal Kmicikiewicz, Pankhil Gawade, Dominik G. Grimm, Ewa Szczurek
TL;DR
The paper tackles the challenge of sample-efficient generative molecular optimization (GMO) under tight evaluation budgets and distribution shift between generator and predictor. It introduces Joint Self-Improvement, a unified framework that (i) trains a joint model with shared parameters for generation and prediction, minimizing a joint log-likelihood loss, and (ii) employs a joint self-improving sampling scheme that biases the generator at inference-time using global advantages derived from the predictive component. The approach leverages a Hyformer backbone and stochastic beam search to produce optimized molecules efficiently, achieving state-of-the-art performance in both offline and online settings across multiple protein targets with restricted oracle access. Ablation studies demonstrate that both joint modeling and self-improving sampling contribute meaningfully, with the method maintaining strong results even at very low molecule-evaluation budgets. Overall, Joint Self-Improvement offers a principled alternative to reinforcement learning-based fine-tuning, reducing variance and distribution-shift issues while delivering practical gains for drug-discovery workflows.
Abstract
Generative molecular optimization aims to design molecules with properties surpassing those of existing compounds. However, such candidates are rare and expensive to evaluate, yielding sample efficiency essential. Additionally, surrogate models introduced to predict molecule evaluations, suffer from distribution shift as optimization drives candidates increasingly out-of-distribution. To address these challenges, we introduce Joint Self-Improvement, which benefits from (i) a joint generative-predictive model and (ii) a self-improving sampling scheme. The former aligns the generator with the surrogate, alleviating distribution shift, while the latter biases the generative part of the joint model using the predictive one to efficiently generate optimized molecules at inference-time. Experiments across offline and online molecular optimization benchmarks demonstrate that Joint Self-Improvement outperforms state-of-the-art methods under limited evaluation budgets.
