Recent advances in interpretable machine learning using structure-based protein representations
Luiz Felipe Vecchietti, Minji Lee, Begench Hangeldiyev, Hyunkyu Jung, Hahnbeom Park, Tae-Kyun Kim, Meeyoung Cha, Ho Min Kim
TL;DR
This paper surveys how structure-based protein representations enable interpretable ML across three core tasks: structure prediction, functionality prediction, and protein–protein interactions. It emphasizes interpretable signals such as $pLDDT$ and $pAE$ in AlphaFold2-like models, and surveys methods that provide explanations via per-residue or per-edge patterns, including GradCAM on graph embeddings and decision-tree paths. The discussion highlights surface- and graph-based representations (MaSIF, dMaSIF) and their interpretability benefits, while also cautioning about limitations of post-hoc explanations and the need for inherently interpretable architectures. Practically, the work argues that improved visualization and interpretable metrics will accelerate protein design, drug discovery, and knowledge discovery in structural biology, guiding future methodological and visualization developments.
Abstract
Recent advancements in machine learning (ML) are transforming the field of structural biology. For example, AlphaFold, a groundbreaking neural network for protein structure prediction, has been widely adopted by researchers. The availability of easy-to-use interfaces and interpretable outcomes from the neural network architecture, such as the confidence scores used to color the predicted structures, have made AlphaFold accessible even to non-ML experts. In this paper, we present various methods for representing protein 3D structures from low- to high-resolution, and show how interpretable ML methods can support tasks such as predicting protein structures, protein function, and protein-protein interactions. This survey also emphasizes the significance of interpreting and visualizing ML-based inference for structure-based protein representations that enhance interpretability and knowledge discovery. Developing such interpretable approaches promises to further accelerate fields including drug development and protein design.
