Table of Contents
Fetching ...

MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

Tianfan Fu, Cao Xiao, Xinhao Li, Lucas M. Glass, Jimeng Sun

TL;DR

MIMOSA reframes molecule optimization as sampling from a target distribution that jointly encodes similarity to an input molecule and multiple drug-property constraints. It pretrains two GNNs to guide substructure edits (add, replace, delete) and uses an MCMC-based Gibbs sampler to select promising candidates, ensuring unbiased, ergodic sampling. The method achieves substantial performance gains over baselines in multi-property and single-property settings, while maintaining molecular validity and scaffold similarity; it also demonstrates reasonable computational efficiency (~10–20 minutes per molecule). The approach provides a flexible, theoretically grounded framework for multi-constraint molecule optimization with open-source implementation available.

Abstract

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate. The code repository (including readme file, data preprocessing and model construction, evaluation) is available https://github.com/futianfan/MIMOSA.

MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

TL;DR

MIMOSA reframes molecule optimization as sampling from a target distribution that jointly encodes similarity to an input molecule and multiple drug-property constraints. It pretrains two GNNs to guide substructure edits (add, replace, delete) and uses an MCMC-based Gibbs sampler to select promising candidates, ensuring unbiased, ergodic sampling. The method achieves substantial performance gains over baselines in multi-property and single-property settings, while maintaining molecular validity and scaffold similarity; it also demonstrates reasonable computational efficiency (~10–20 minutes per molecule). The approach provides a flexible, theoretically grounded framework for multi-constraint molecule optimization with open-source implementation available.

Abstract

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate. The code repository (including readme file, data preprocessing and model construction, evaluation) is available https://github.com/futianfan/MIMOSA.

Paper Structure

This paper contains 11 sections, 3 theorems, 24 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Suppose $\{Y_1,Y_2,\cdots, Y_n\}$ is the chain of molecules sampled via MCMC based on transition kernel defined in Eq. eqn:proposal, with initial state $X$, then the Markov chain is ergodic with stationary distribution $p_X(Y)$ in Eq. eqn:target_distribution. That is, empirical estimate (time averag

Figures (2)

  • Figure 1: The Multi-constraint Molecule Sampling for Molecule Optimization(MIMOSA ) framework illustrated using a single molecule. In Step I (Pretrain GNN), MIMOSA pretrains two property-agnostic GNNs for molecule topology and substructure-type prediction. Then, in Step II (Candidate Generation), MIMOSA uses the prediction and employs three basic substructure operations (ADD, REPLACE and DELETE) to generate new molecule candidates. In Step III (Candidate Selection), MIMOSA assigns weights for new molecule. The weights can encode multiple constraints including similarity and drug property constraints, upon which we accept promising molecules for next iteration. MIMOSA iteratively edits the molecule and can efficiently draw molecule samples.
  • Figure 2: Exp 3. Examples of "QED & PLogP" optimization. (Upper), the imidazole ring in the input molecule (a) is replaced by less polar rings thiazole (b and c) and thiadiazol (d). Since more polar indicates lower PLogP, the output molecules increase PLogP while maintaining the molecular scaffold. Lower), the PLogP of input molecule (e) is increased by neutralizing the ionized amine (g) or replacing with substructures with less electronegativity (f and h). These changes improve the QED.

Theorems & Definitions (6)

  • Definition 1: Tanimoto Similarity of Molecules
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • proof
  • proof