Table of Contents
Fetching ...

Representation choice shapes the interpretation of protein conformational dynamics

Axel Giottonini, Thomas Lemmin

Abstract

Molecular dynamics simulations provide detailed trajectories at the atomic level, but extracting interpretable and robust insights from these high-dimensional data remains challenging. In practice, analyses typically rely on a single representation. Here, we show that representation choice is not neutral: it fundamentally shapes the conformational organization, similarity relationships, and apparent transitions inferred from identical simulation data. To complement existing representations, we introduce Orientation features, a geometrically grounded, rotation-aware encoding of protein backbone. We compare it against common descriptions across three dynamical regimes: fast-folding proteins, large-scale domain motions, and protein-protein association. Across these systems, we find that different representations emphasize complementary aspects of conformational space, and that no single representation provides a complete picture of the underlying dynamics. To facilitate systematic comparison, we developed ManiProt, a library for efficient computation and analysis of multiple protein representations. Our results motivate a comparative, representation-aware framework for the interpretation of molecular dynamics simulations.

Representation choice shapes the interpretation of protein conformational dynamics

Abstract

Molecular dynamics simulations provide detailed trajectories at the atomic level, but extracting interpretable and robust insights from these high-dimensional data remains challenging. In practice, analyses typically rely on a single representation. Here, we show that representation choice is not neutral: it fundamentally shapes the conformational organization, similarity relationships, and apparent transitions inferred from identical simulation data. To complement existing representations, we introduce Orientation features, a geometrically grounded, rotation-aware encoding of protein backbone. We compare it against common descriptions across three dynamical regimes: fast-folding proteins, large-scale domain motions, and protein-protein association. Across these systems, we find that different representations emphasize complementary aspects of conformational space, and that no single representation provides a complete picture of the underlying dynamics. To facilitate systematic comparison, we developed ManiProt, a library for efficient computation and analysis of multiple protein representations. Our results motivate a comparative, representation-aware framework for the interpretation of molecular dynamics simulations.

Paper Structure

This paper contains 32 sections, 16 equations, 30 figures, 1 table.

Figures (30)

  • Figure 1: Orientation Features Workflow. (1) A Local Coordinate System is constructed for each residue from the amide nitrogen (N), alpha carbon (C$\alpha$), and carbonyl carbon (C) positions, forming an orthonormal basis. (2) Each LCS is aligned to the per-residue intrinsic mean orientation and projected onto the tangent space at the identity element of SO(3), yielding features in the Lie algebra $\mathfrak{so}(3)$.
  • Figure 2: Kinetic Characterization of Fast-Folding Proteins. (A) VAMP-2 scores for each molecular representation (y-axis) evaluated at the comparison lag time. Systems are shown along the x-axis. (B, left) TICA landscapes of 1FME for Orientation $\odot$ and torsion angle representations. The x- and y-axes show TIC1 and TIC2, respectively, with implied timescales annotated. Points are colored by Orientation $\odot$ cluster assignment; darker shades indicate unassigned frames. (B, right) Representative centroid structures for each cluster, shown in cartoon representation.
  • Figure 3: nsp13 Gram Matrices Analysis. (A) Gram matrices for Orientation and Orientation $\odot$ representations. Axes represent time across merged trajectories; color intensity indicates pairwise inner product values. Black squares highlight closed-like conformations. Top row: full Gram matrices. Bottom row: first two rank-1 projection matrices. (B) Spearman correlation coefficients between Gram matrices and structural similarity measures. Blue bars: correlation with RMSD; red bars: correlation with lDDT. Inner bars show correlations for rank-1 components.
  • Figure 4: nsp13 Clusters. (A) Clustering agreement across representations. Upper triangle: Adjusted Rand Score; lower triangle: Adjusted Mutual Information. (B) PC1–PC2 projections for each representation. Points are colored by Orientation $\odot$ cluster assignment. (C) Centroid structures of the five most populated Orientation $\odot$ clusters in panel B, aligned on the stalk and RecA1 domains. The monomeric nsp13 protein is shown in a ribbon representation, with domains color-coded as follows: ZB (orange), Stalk (pink), 1B (yellow), RecA1 (red) and RecA2 (Purple).
  • Figure 5: Molecular Representations for Barnase–Barstar Association. Results for (A) Orientation $\odot$, (B) C$\alpha$ coordinates, and (C) torsion angles. In all panels: (.1) PC1 (y-axis) versus IRMSD (x-axis). Interacting frames shown in purple-to-orange gradient (by IRMSD); non-interacting frames in orange. Gray region: interquartile range $[Q_1{-}1.5\,\mathrm{IQR},\, Q_3{+}1.5\,\mathrm{IQR}]$; Pearson Correlation Coefficient (PCC) annotated. In all panels: (.2) PC1 density distributions for interacting/non-interacting states with quartile lines. In all panels: (.3) Reference structure (Barstar: light gray; Barnase: dark gray). Top 10 PC1-contributing residues highlighted in red; hydrogen bonds shown as dashed lines (red: involving contributors; blue: other).
  • ...and 25 more figures