Amortized Molecular Optimization via Group Relative Policy Optimization
Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay, Alexander Mitsos, Dominik G. Grimm, Martin Grohe
TL;DR
This work tackles the scalability gap between instance optimization and amortized molecular design by addressing high variance across starting structures with Group Relative Policy Optimization (GRPO). The authors introduce GRXForm, a Graph Transformer policy that constructs molecules through atom-and-bond additions under structural constraints, amortized across tasks. GRPO normalizes rewards within groups of trajectories per starting structure, stabilizing learning and enabling fast, generalizable optimization without inference-time oracle calls. Empirical results across kinase scaffold decoration, prodrug transfer, and PMO benchmarks show GRXForm is competitive with top instance optimizers and substantially more efficient at scale. The approach promises practical impact for high-throughput molecular design, offering a scalable alternative to iterative search while preserving multi-objective performance.
Abstract
Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as "Instance Optimizers'', expending significant compute restarting the search for every input structure. While model-based approaches theoretically offer amortized efficiency by learning a policy transferable to unseen structures, existing methods struggle to generalize. We identify a key failure mode: the high variance arising from the heterogeneous difficulty of distinct starting structures. To address this, we introduce GRXForm, adapting a pre-trained Graph Transformer model that optimizes molecules via sequential atom-and-bond additions. We employ Group Relative Policy Optimization (GRPO) for goal-directed fine-tuning to mitigate variance by normalizing rewards relative to the starting structure. Empirically, GRXForm generalizes to out-of-distribution molecular scaffolds without inference-time oracle calls or refinement, achieving scores in multi-objective optimization competitive with leading instance optimizers.
