Mathematical Insights into Protein Architecture: Persistent Homology and Machine Learning Applied to the Flagellar Motor
Zakaria Lamine, Abdelatif Hafid, Mohamed Rahouti
TL;DR
This work develops a framework that uses persistent homology to extract multiscale topological features from protein structures and integrates them with biochemical descriptors to classify bacterial flagellar motors as rotated or stalled. The authors formalize the algebraic topology underpinning persistence, implement a filtration-based pipeline on PDB data, and fuse topological and biochemical features within an XGBoost classifier. The approach yields around 90% classification accuracy and includes practical tools such as barcodes visualization and a GUI for real-time predictions. Overall, the study demonstrates that topology-informed descriptors can reveal functionally relevant patterns beyond traditional geometric measures, with potential for broader protein-function prediction.
Abstract
We present a machine learning approach that leverages persistent homology to classify bacterial flagellar motors into two functional states: rotated and stalled. By embedding protein structural data into a topological framework, we extract multiscale features from filtered simplicial complexes constructed over atomic coordinates. These topological invariants, specifically persistence diagrams and barcodes, capture critical geometric and connectivity patterns that correlate with motor function. The extracted features are vectorized and integrated into a machine learning pipeline that includes dimensionality reduction and supervised classification. Applied to a curated dataset of experimentally characterized flagellar motors from diverse bacterial species, our model demonstrates high classification accuracy and robustness to structural variation. This approach highlights the power of topological data analysis in revealing functionally relevant patterns beyond the reach of traditional geometric descriptors, offering a novel computational tool for protein function prediction.
