Zero-shot protein stability prediction by inverse folding models: a free energy interpretation
Jes Frellsen, Maher M. Kassem, Tone Bengtsen, Lars Olsen, Kresten Lindorff-Larsen, Jesper Ferkinghoff-Borg, Wouter Boomsma
TL;DR
The work tackles the problem of interpreting zero-shot protein stability predictions from inverse folding models through a thermodynamic lens. It derives a formal link between changes in thermodynamic stability, $\beta\Delta\Delta G$, and inverse-folding posteriors, showing how the common log-odds approach emerges under specific approximations. The authors propose multiple refinements, including explicit unfolded-state modeling and multi-structure sampling, and demonstrate that these simple modifications can yield measurable gains across several benchmark datasets. They also present scalable strategies, such as BioEmu, to approximate structural ensembles without expensive simulations. Overall, the paper provides a principled framework to improve zero-shot stability prediction by integrating unfolded-state contributions and ensemble information, with broad implications for protein design and variant interpretation.
Abstract
Inverse folding models have proven to be highly effective zero-shot predictors of protein stability. Despite this success, the link between the amino acid preferences of an inverse folding model and the free-energy considerations underlying thermodynamic stability remains incompletely understood. A better understanding would be of interest not only from a theoretical perspective, but also potentially provide the basis for stronger zero-shot stability prediction. In this paper, we take steps to clarify the free-energy foundations of inverse folding models. Our derivation reveals the standard practice of likelihood ratios as a simplistic approximation and suggests several paths towards better estimates of the relative stability. We empirically assess these approaches and demonstrate that considerable gains in zero-shot performance can be achieved with fairly simple means.
