MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling
Peter Eckmann, Dongxia Wu, Germano Heinzelmann, Michael K Gilson, Rose Yu
TL;DR
MFBind presents a practical multi-fidelity framework for evaluating drug compounds by fusing AutoDock4 docking, experimental activity data, and ABFE molecular dynamics; a deep surrogate with a shared encoder and fidelity-specific linear heads is pretrained on cheaper fidelities and refined with active learning to efficiently predict ABFE. The approach demonstrates superior surrogate performance over multiple baselines under budget constraints and, when used as a reward in a generative model, yields compounds with substantially stronger predicted binding affinities than single-fidelity methods. The work shows that leveraging lower-cost signals alongside expensive ABFE data can meaningfully improve both predictive accuracy and the quality of generated candidates, suggesting a viable path for making generative drug discovery more practical. Limitations include the restricted set of fidelities and synthesis considerations for generated compounds, with future work aimed at adding fidelities and enhancing the acquisition strategy.
Abstract
Current generative models for drug discovery primarily use molecular docking to evaluate the quality of generated compounds. However, such models are often not useful in practice because even compounds with high docking scores do not consistently show experimental activity. More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but they are too computationally expensive to use in a generative model. We propose a multi-fidelity approach, Multi-Fidelity Bind (MFBind), to achieve the optimal trade-off between accuracy and computational cost. MFBind integrates docking and binding free energy simulators to train a multi-fidelity deep surrogate model with active learning. Our deep surrogate model utilizes a pretraining technique and linear prediction heads to efficiently fit small amounts of high-fidelity data. We perform extensive experiments and show that MFBind (1) outperforms other state-of-the-art single and multi-fidelity baselines in surrogate modeling, and (2) boosts the performance of generative models with markedly higher quality compounds.
