Protein Structure-Function Relationship: A Kernel-PCA Approach for Reaction Coordinate Identification
Parisa Mollaei, Amir Barati Farimani
TL;DR
Proteins encode function through conformation, but extracting structure–function links from high-dimensional MD trajectories is challenging. The authors introduce a Kernel–PCA pipeline with an angular kernel $K( ext{\lambda}_1,\text{\lambda}_2,\text{\lambda}_3)$ that maps atomic coordinates into a feature space, followed by PCA to a 2D representation and selection by the correlation ratio $C_r$ to identify and rank reaction coordinates (RCs) via their relation to a protein property. The method recovers known activation coordinates in the β2 adrenergic receptor and reveals RCs driving folding in small proteins, with network-like interactions among top RCs; using CB atoms often preserves information while drastically reducing feature size. This framework offers a generalizable, efficient tool for MD-based structure–function analysis and RC interpretation, with potential implications for drug design and protein engineering.
Abstract
In this study, we propose a Kernel-PCA model designed to capture structure-function relationships in a protein. This model also enables ranking of reaction coordinates according to their impact on protein properties. By leveraging machine learning techniques, including Kernel and principal component analysis (PCA), our model uncovers meaningful patterns in high-dimensional protein data obtained from molecular dynamics (MD) simulations. The effectiveness of our model in accurately identifying reaction coordinates has been demonstrated through its application to a G protein-coupled receptor. Furthermore, this model utilizes a network-based approach to uncover correlations in the dynamic behavior of residues associated with a specific protein property. These findings underscore the potential of our model as a powerful tool for protein structure-function analysis and visualization.
