Fast and Interpretable Protein Substructure Alignment via Optimal Transport
Zhiyu Wang, Bingxin Zhou, Jing Wang, Yang Tan, Weishu Zhao, Pietro Liò, Liang Hong
TL;DR
PLASMA introduces an entropy-regularized optimal transport framework for residue-level protein substructure alignment, delivering interpretable alignment matrices and a normalized similarity score via differentiable Sinkhorn iterations. It supports both trainable (PLASMA) and training-free (PLASMA-PF) variants, with a Label Match Loss to guide localization when annotations exist. Across interpolation and extrapolation tasks on diverse backbone representations, PLASMA achieves superior accuracy and efficiency (≈10 ms per protein pair) relative to global and local baselines, while preserving interpretability of alignments. The method enables robust detection and localization of functional motifs across proteins with varying sequences and folds, offering practical value for functional annotation, evolution studies, and structure-guided design.
Abstract
Proteins are essential biological macromolecules that execute life functions. Local motifs within protein structures, such as active sites, are the most critical components for linking structure to function and are key to understanding protein evolution and enabling protein engineering. Existing computational methods struggle to identify and compare these local structures, which leaves a significant gap in understanding protein structures and harnessing their functions. This study presents PLASMA, the first deep learning framework for efficient and interpretable residue-level protein substructure alignment. We reformulate the problem as a regularized optimal transport task and leverage differentiable Sinkhorn iterations. For a pair of input protein structures, PLASMA outputs a clear alignment matrix with an interpretable overall similarity score. Through extensive quantitative evaluations and three biological case studies, we demonstrate that PLASMA achieves accurate, lightweight, and interpretable residue-level alignment. Additionally, we introduce PLASMA-PF, a training-free variant that provides a practical alternative when training data are unavailable. Our method addresses a critical gap in protein structure analysis tools and offers new opportunities for functional annotation, evolutionary studies, and structure-based drug design. Reproducibility is ensured via our official implementation at https://github.com/ZW471/PLASMA-Protein-Local-Alignment.git.
