MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic

Masahi Okada; Kazuki Sakai; Hiroaki Yoshida; Masaki Okoshi; Tadahiro Taniguchi

MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic

Masahi Okada, Kazuki Sakai, Hiroaki Yoshida, Masaki Okoshi, Tadahiro Taniguchi

TL;DR

MolLIBRA tackles sample-efficient molecular optimization under a limited oracle budget by integrating a multimodal pre-evaluation framework into a genetic algorithm. It combines an ensemble of Gaussian process surrogates operating on multiple fingerprints with a text-molecule aligned CLAMP critic to produce a zero-shot scoring signal and robust candidate ranking before costly evaluations. The method adaptively gates among critics and leverages both structural fingerprints and language descriptions, achieving state-of-the-art Top-10 AUC on PMO-1K (14/22 tasks) and the highest total across tasks for MolLIBRA-L. The results underscore the value of representation-robust, language-informed priors in low-data regimes for drug design and point to future work on richer critics and broader fingerprints.

Abstract

We study sample-efficient molecular optimization under a limited budget of oracle evaluations. We propose MolLIBRA (MultimOdaLity and Language Integrated Bayesian and evolutionaRy optimizAtion), a genetic algorithm based framework that pre-ranks candidate molecules using multiple critics before oracle calls: (i) an ensemble of Gaussian process (GP) surrogates defined over multiple molecular fingerprints and (ii) a pretrained text-molecule aligned encoder CLAMP. The GP ensemble enables adaptive selection of task-appropriate fingerprints, while CLAMP provides a zero-shot scoring signal from task descriptions by measuring the similarity between molecular and text embeddings. On the Practical Molecular Optimization (PMO) benchmark with a budget of 1,000 evaluations (PMO-1K), MolLIBRA-L, our variant with a language-model-based candidate generator, attains the best Top-10 AUC on 14/22 tasks and the highest overall sum of Top-10 AUC across tasks among prior methods.

MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic

TL;DR

Abstract

Paper Structure (37 sections, 5 equations, 5 figures, 9 tables, 5 algorithms)

This paper contains 37 sections, 5 equations, 5 figures, 9 tables, 5 algorithms.

Introduction
Preliminaries
Black-box Optimization of Molecular Structures
Molecular Optimization via Genetic Algorithm
Gaussian Process Models
Fingerprints and Their Similarity Measure
Ensemble of GP Models
CLAMP
Related Work
Method
Definition of Multiple Critics
Selection of Multiple Critics' Results
Implementation Details
Fingerprints, kernels and GP models
Candidate generation
...and 22 more sections

Figures (5)

Figure 1: A conceptual illustration of MolLIBRA, a GA-based molecular optimization framework with multi-fingerprint surrogates and a text--molecule-aligned critic. MolLIBRA integrates two modalities for pre-evaluation: molecular fingerprints and natural-language task descriptions. The critics consist of learnable Gaussian process (GP) models defined over multiple fingerprints and a zero-shot critic based on a pretrained and frozen CLAMP model Ramsauer2023CLAMP. A critic is probabilistically selected for candidate ranking, and the selection probabilities are updated using the newly observed oracle scores.
Figure 2: Heatmap visualizing the contribution of critics (structured-space GPs and the CLAMP critic) in MolLIBRA-$\mathcal{L}$'s optimization process. The color intensity indicates the accumulation of step-wise improvement in oracle scores realized by each critic. In the figure, the contributions are normalized so that the total contribution of all critics sums to 100%. Similar results for MolLIBRA-$\mathcal{G}$ are provided in Appendix Figure \ref{['fig:modal_contrib_mollibrag']}.
Figure 3: Temporal evolution of contributions in four tasks (results from a single seed run). The cumulative score improvement realized by each critic is shown as an area chart.
Figure 4: Heatmap of critic contributions during the optimization process of MolLIBRA-$\mathcal{G}$. Compared to MolLIBRA-$\mathcal{L}$ shown in Figure \ref{['fig:modal_contrib_mollibral']}, the dominant critics for each task are generally consistent.
Figure 5: Heatmaps showing the critic contributions for different seeds. While amlodipine_mpo shows consistent critic contributions across seeds, fexofenadine_mpo exhibits large variation depending on the seed.

MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic

TL;DR

Abstract

MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic

Authors

TL;DR

Abstract

Table of Contents

Figures (5)