PETIMOT: A Novel Framework for Inferring Protein Motions from Sparse Data Using SE(3)-Equivariant Graph Neural Networks
Valentin Lombard, Sergei Grudinin, Elodie Laine
TL;DR
PETIMOT addresses the challenge of inferring continuous protein motions from sparse experimental data by learning compact linear motion eigenspaces of the coverage-weighted covariance matrix C and representing deformations with Y in R^{3N x K}. It introduces a SE(3)-equivariant graph neural network that fuses sequence embeddings from protein language models with motion vectors in a dual-track architecture, guided by symmetry-aware geometric losses LS, SS, and IS. Trained on about 750k conformational collections from the Protein Data Bank and evaluated on 824 test proteins, PETIMOT outperforms diffusion-flow baselines and traditional normal mode analysis in both accuracy (higher motion capture) and speed (inference times ~16 seconds for the full test set). The approach yields an interpretable, scalable representation of protein dynamics with strong generalization and practical potential for engineering and drug design.
Abstract
Proteins move and deform to ensure their biological functions. Despite significant progress in protein structure prediction, approximating conformational ensembles at physiological conditions remains a fundamental open problem. This paper presents a novel perspective on the problem by directly targeting continuous compact representations of protein motions inferred from sparse experimental observations. We develop a task-specific loss function enforcing data symmetries, including scaling and permutation operations. Our method PETIMOT (Protein sEquence and sTructure-based Inference of MOTions) leverages transfer learning from pre-trained protein language models through an SE(3)-equivariant graph neural network. When trained and evaluated on the Protein Data Bank, PETIMOT shows superior performance in time and accuracy, capturing protein dynamics, particularly large/slow conformational changes, compared to state-of-the-art flow-matching approaches and traditional physics-based models.
