Predicting solvation free energies with an implicit solvent machine learning potential
Sebastien Röcken, Anton F. Burnet, Julija Zavadlav
TL;DR
This paper introduces ReSolv, a two-stage Solvation Free Energy Path Reweighting framework to train an implicit-solvent ML potential. By first fitting a vacuum ML potential to DFT data and then refining it against experimental hydration free energies via differentiable trajectory reweighting (DiffTRe) and BAR/FEP-based free-energy accumulation, ReSolv achieves hydration free energy predictions near experimental uncertainty while delivering substantial speedups over explicit-solvent ML potentials. The approach demonstrates strong performance on the FreeSolv dataset, robust generalization to unseen functional groups, and insightful error-analysis correlations that inform data curation. Overall, ReSolv offers a scalable, accurate, and efficient path toward implicit-solvent ML models that can accelerate drug design and related solvation studies.
Abstract
Machine learning (ML) potentials are a powerful tool in molecular modeling, enabling ab initio accuracy for comparably small computational costs. Nevertheless, all-atom simulations employing best-performing graph neural network architectures are still too expensive for applications requiring extensive sampling, such as free energy computations. Implicit solvent models could provide the necessary speed-up due to reduced degrees of freedom and faster dynamics. Here, we introduce a Solvation Free Energy Path Reweighting (ReSolv) framework to parametrize an implicit solvent ML potential for small organic molecules that accurately predicts the hydration free energy, an essential parameter in drug design and pollutant modeling. With a combination of top-down (experimental hydration free energy data) and bottom-up (ab initio data of molecules in a vacuum) learning, ReSolv bypasses the need for intractable ab initio data of molecules in explicit bulk solvent and does not have to resort to less accurate data-generating models. On the FreeSolv dataset, ReSolv achieves a mean absolute error close to average experimental uncertainty, significantly outperforming standard explicit solvent force fields. Compared to the explicit solvent ML potential, ReSolv offers a computational speedup of four orders of magnitude and attains closer agreement with experiments. The presented framework paves the way toward deep molecular models that are more accurate yet computationally cheaper than classical atomistic models.
