Table of Contents
Fetching ...

PETIMOT: A Novel Framework for Inferring Protein Motions from Sparse Data Using SE(3)-Equivariant Graph Neural Networks

Valentin Lombard, Sergei Grudinin, Elodie Laine

TL;DR

PETIMOT addresses the challenge of inferring continuous protein motions from sparse experimental data by learning compact linear motion eigenspaces of the coverage-weighted covariance matrix C and representing deformations with Y in R^{3N x K}. It introduces a SE(3)-equivariant graph neural network that fuses sequence embeddings from protein language models with motion vectors in a dual-track architecture, guided by symmetry-aware geometric losses LS, SS, and IS. Trained on about 750k conformational collections from the Protein Data Bank and evaluated on 824 test proteins, PETIMOT outperforms diffusion-flow baselines and traditional normal mode analysis in both accuracy (higher motion capture) and speed (inference times ~16 seconds for the full test set). The approach yields an interpretable, scalable representation of protein dynamics with strong generalization and practical potential for engineering and drug design.

Abstract

Proteins move and deform to ensure their biological functions. Despite significant progress in protein structure prediction, approximating conformational ensembles at physiological conditions remains a fundamental open problem. This paper presents a novel perspective on the problem by directly targeting continuous compact representations of protein motions inferred from sparse experimental observations. We develop a task-specific loss function enforcing data symmetries, including scaling and permutation operations. Our method PETIMOT (Protein sEquence and sTructure-based Inference of MOTions) leverages transfer learning from pre-trained protein language models through an SE(3)-equivariant graph neural network. When trained and evaluated on the Protein Data Bank, PETIMOT shows superior performance in time and accuracy, capturing protein dynamics, particularly large/slow conformational changes, compared to state-of-the-art flow-matching approaches and traditional physics-based models.

PETIMOT: A Novel Framework for Inferring Protein Motions from Sparse Data Using SE(3)-Equivariant Graph Neural Networks

TL;DR

PETIMOT addresses the challenge of inferring continuous protein motions from sparse experimental data by learning compact linear motion eigenspaces of the coverage-weighted covariance matrix C and representing deformations with Y in R^{3N x K}. It introduces a SE(3)-equivariant graph neural network that fuses sequence embeddings from protein language models with motion vectors in a dual-track architecture, guided by symmetry-aware geometric losses LS, SS, and IS. Trained on about 750k conformational collections from the Protein Data Bank and evaluated on 824 test proteins, PETIMOT outperforms diffusion-flow baselines and traditional normal mode analysis in both accuracy (higher motion capture) and speed (inference times ~16 seconds for the full test set). The approach yields an interpretable, scalable representation of protein dynamics with strong generalization and practical potential for engineering and drug design.

Abstract

Proteins move and deform to ensure their biological functions. Despite significant progress in protein structure prediction, approximating conformational ensembles at physiological conditions remains a fundamental open problem. This paper presents a novel perspective on the problem by directly targeting continuous compact representations of protein motions inferred from sparse experimental observations. We develop a task-specific loss function enforcing data symmetries, including scaling and permutation operations. Our method PETIMOT (Protein sEquence and sTructure-based Inference of MOTions) leverages transfer learning from pre-trained protein language models through an SE(3)-equivariant graph neural network. When trained and evaluated on the Protein Data Bank, PETIMOT shows superior performance in time and accuracy, capturing protein dynamics, particularly large/slow conformational changes, compared to state-of-the-art flow-matching approaches and traditional physics-based models.

Paper Structure

This paper contains 49 sections, 3 theorems, 13 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Theorem A.1

SS Loss is invariant under unitary transformations of $X$ and $Y$ subspaces.

Figures (12)

  • Figure 1: PETIMOT's architecture overview. The model processes both sequence embeddings ($s$) and motion vectors ($\vec{x}$) through 15 message-passing blocks. Each block updates both representations by aggregating information from neighboring residues. Neighbor features are computed in the reference frame of the central residue $i$, ensuring SE(3) equivariance. The geometric features encoded in the edges capture the relative spatial relationships between residue pairs. Three types of losses (LS, SS, and IS) are computed, with prior normalization of the predictions for the IS and SS losses, and an additional orthogonalisation of the predictions for the SS loss.
  • Figure 2: Cumulative error curves computed on the test proteins.a-b. Comparison between PETIMOT base model and three other methods. c-d. Comparison between different losses implemented in PETIMOT. The loss of the base model is LS + SS. a,c. Minimum LS error corresponding to the best matching pair of predicted and ground-truth motions. b,d. SS error computed between the entire predicted and ground-truth subspaces.
  • Figure 3: Individual predictions.a. The per-protein minimum LS errors, computed for the best-matching pairs between predicted and ground-truth vectors, are reported for PETIMOT (black), the NMA (red), AlphaFlow (blue) and ESMFlow (green). The values are in ascending order of the errors computed for PETIMOT, from best to worse. b-c. Trajectories generated by deforming a protein structure along PETIMOT best predicted motion. Five trajectory snapshots are shown colored from yellow to orange. b.Bacillus subtilis xylanase A (PDB id: 3EXU, chain A). c. Murine Fab fragment (PDB id: 7SD2, chain A).
  • Figure B.1: Network depth ablation. We report cumulative curves for LS error (a-b), magnitude error (c-d), and SS error (e). For each protein, we computed the error either for the best-matching pair of predicted and ground-truth vectors (a,c) or for the best combination of four pairs of predicted and ground-truth vectors (b,d). We vary the number of layers in the network and the embedding dimension.
  • Figure B.2: Structure and sequence information ablation study. We report cumulative curves for LS error (a-b), magnitude error (c-d), and SS error (e). For each protein, we computed the LS and magnitude errors either for the best-matching pair of predicted and ground-truth vectors (a,c) or for the best combination of four pairs of predicted and ground-truth vectors (b,d).
  • ...and 7 more figures

Theorems & Definitions (6)

  • Theorem A.1
  • proof
  • Corollary A.1.1
  • proof
  • Theorem A.2
  • proof