MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling

Peter Eckmann; Dongxia Wu; Germano Heinzelmann; Michael K Gilson; Rose Yu

MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling

Peter Eckmann, Dongxia Wu, Germano Heinzelmann, Michael K Gilson, Rose Yu

TL;DR

MFBind presents a practical multi-fidelity framework for evaluating drug compounds by fusing AutoDock4 docking, experimental activity data, and ABFE molecular dynamics; a deep surrogate with a shared encoder and fidelity-specific linear heads is pretrained on cheaper fidelities and refined with active learning to efficiently predict ABFE. The approach demonstrates superior surrogate performance over multiple baselines under budget constraints and, when used as a reward in a generative model, yields compounds with substantially stronger predicted binding affinities than single-fidelity methods. The work shows that leveraging lower-cost signals alongside expensive ABFE data can meaningfully improve both predictive accuracy and the quality of generated candidates, suggesting a viable path for making generative drug discovery more practical. Limitations include the restricted set of fidelities and synthesis considerations for generated compounds, with future work aimed at adding fidelities and enhancing the acquisition strategy.

Abstract

Current generative models for drug discovery primarily use molecular docking to evaluate the quality of generated compounds. However, such models are often not useful in practice because even compounds with high docking scores do not consistently show experimental activity. More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but they are too computationally expensive to use in a generative model. We propose a multi-fidelity approach, Multi-Fidelity Bind (MFBind), to achieve the optimal trade-off between accuracy and computational cost. MFBind integrates docking and binding free energy simulators to train a multi-fidelity deep surrogate model with active learning. Our deep surrogate model utilizes a pretraining technique and linear prediction heads to efficiently fit small amounts of high-fidelity data. We perform extensive experiments and show that MFBind (1) outperforms other state-of-the-art single and multi-fidelity baselines in surrogate modeling, and (2) boosts the performance of generative models with markedly higher quality compounds.

MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling

TL;DR

Abstract

Paper Structure (42 sections, 1 equation, 8 figures, 3 tables, 1 algorithm)

This paper contains 42 sections, 1 equation, 8 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Molecular generative models
Multi-fidelity modeling
MFBind
Multi-fidelity binding affinity environment
Multi-fidelity deep surrogate model
Active learning to train multi-fidelity surrogate
Experimental results
Multi-fidelity surrogate modeling
Setup
Baselines
Results
Surrogate model performance.
Ablation study.
...and 27 more sections

Figures (8)

Figure 1: Overview of MFBind. We train a multi-fidelity surrogate model to predict the outputs from all fidelity simulators. Then, we use the model to evaluate the acquisition function, and then pick the next molecule and fidelity level to query the simulators. The result is then added to the training dataset, and the process is repeated. A generative model uses the trained multi-fidelity surrogate model to evaluate its candidate compounds.
Figure 2: ROC curve of each simulator on the BRD4(2) test set. Two curves are shown for AutoDock4, one that uses the total binding energy output only, and one that uses a linear surrogate that takes all 16 outputs from AutoDock4 and outputs a prediction of the ABFE score.
Figure 3: Regression of ABFE scores in an active learning setting. The y-axis shows the mean squared error (MSE), in kcal/mol, of each method on the held-out test set. The x-axis shows the cumulative active learning query cost in days (wall clock time on a 9 core, 8 GPU server). Each line represents an average over 20 runs with random seeds (using caching of ABFE results to reduce running times), with the shaded region indicating the standard deviation across runs.
Figure 4: Selected generated compounds from LIMO + MFBind. The top compound for both BRD4(2) and c-MET are shown. See Appendix \ref{['appendix-limo-compounds-sec']} for more compounds.
Figure 5: Diagram of the MFBind surrogate model. An input molecule, represented as a Morgan fingerprint, is fed through the deep encoder to produce a latent representation. That representation is then passed to linear fidelity-specific prediction heads to produce a prediction for each fidelity level.
...and 3 more figures

MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling

TL;DR

Abstract

MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (8)