Applied causality to infer protein dynamics and kinetics
Akashnathan Aranganathan, Eric R. Beyerle
TL;DR
This work addresses the lack of timescale information in conformational ensembles produced by generative molecular models. It couples AlphaFold2-derived ensembles to a causal, memoryless overdamped Langevin framework (LE4PD) by parameterizing a potential of mean force from the ensemble covariance, enabling residue-level kinetics via autocorrelation functions and the integrated timescale $\tau_{\mathrm{avg}}$. Across six HIV-1 protease variants, the study shows an inverse relationship between AlphaFold2 MSA depth and ensemble timescales, with shallow MSA probing longer, MD-like dynamics and deeper MSA aligning with shorter, MD-like timescales; reweighting out-of-set structures further refines the predictions. The results generalize to other generative ML models and even to AFmultimer for dimer dynamics, offering a route toward causally meaningful, fast-generation predictions of protein kinetics with potential integration into ML training and reweighting schemes for improved physical fidelity.
Abstract
The use of generative machine learning models, trained on the experimentally resolved structures deposited in the protein data bank, is an attractive approach to sampling conformational ensembles of proteins. However, the ensembles generated by these models lack timescale or causal information. We use the structural ensembles generated from AlphaFold2 at a range of MSA depths to parameterize the potential of mean force of an overdamped, memory-free, coarse-grained Langevin equation. This approach couples the AlphaFold2 ensembles to a causal model, allowing us to estimate the timescales spanned by the ensembles generated at each MSA depth. Performing this analysis on six variants of HIV-1 protease, we confirm an inverse relationship between MSA depth and the timescale of an ensemble's conformational fluctuations. The MSA depth essentially serves as a conformational restraint, and AlphaFold2 is generally able to probe timescales at or below those seen in microsecond-long, unbiased molecular dynamics simulations. We conclude by generalizing this approach to other generative structural ensemble-prediction methods as well as co-folding models, in this case the biologically functional HIV-1 protease dimer.
