MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic
Masahi Okada, Kazuki Sakai, Hiroaki Yoshida, Masaki Okoshi, Tadahiro Taniguchi
TL;DR
MolLIBRA tackles sample-efficient molecular optimization under a limited oracle budget by integrating a multimodal pre-evaluation framework into a genetic algorithm. It combines an ensemble of Gaussian process surrogates operating on multiple fingerprints with a text-molecule aligned CLAMP critic to produce a zero-shot scoring signal and robust candidate ranking before costly evaluations. The method adaptively gates among critics and leverages both structural fingerprints and language descriptions, achieving state-of-the-art Top-10 AUC on PMO-1K (14/22 tasks) and the highest total across tasks for MolLIBRA-L. The results underscore the value of representation-robust, language-informed priors in low-data regimes for drug design and point to future work on richer critics and broader fingerprints.
Abstract
We study sample-efficient molecular optimization under a limited budget of oracle evaluations. We propose MolLIBRA (MultimOdaLity and Language Integrated Bayesian and evolutionaRy optimizAtion), a genetic algorithm based framework that pre-ranks candidate molecules using multiple critics before oracle calls: (i) an ensemble of Gaussian process (GP) surrogates defined over multiple molecular fingerprints and (ii) a pretrained text-molecule aligned encoder CLAMP. The GP ensemble enables adaptive selection of task-appropriate fingerprints, while CLAMP provides a zero-shot scoring signal from task descriptions by measuring the similarity between molecular and text embeddings. On the Practical Molecular Optimization (PMO) benchmark with a budget of 1,000 evaluations (PMO-1K), MolLIBRA-L, our variant with a language-model-based candidate generator, attains the best Top-10 AUC on 14/22 tasks and the highest overall sum of Top-10 AUC across tasks among prior methods.
