MDIntrinsicDimension: Dimensionality-Based Analysis of Collective Motions in Macromolecules from Molecular Dynamics Trajectories
Irene Cazzaniga, Toni Giorgino
TL;DR
The paper addresses the challenge of quantifying the effective dimensionality of biomolecular conformational space from MD trajectories. It introduces MDIntrinsicDimension, a Python package that uses rotation- and translation-invariant internal-coordinate projections and scikit-dimension estimators (default TwoNN) to estimate intrinsic dimension. Three analysis modes (whole-molecule, sliding-window along sequence, and secondary-structure elements) produce global and time-resolved IDs, enabling detection of transitions and regional heterogeneity. Applied to DESRES folding trajectories of villin HP35 and NTL9, ID complements RMSD and traditional descriptors, revealing localized flexibility and transient intermediates, and offering a data-driven lens for building collective variables and Markov models.
Abstract
Molecular dynamics (MD) simulations provide atomistic insights into the structure, dynamics, and function of biomolecules by generating time-resolved, high-dimensional trajectories. Analyzing such data benefits from estimating the minimal number of variables required to describe the explored conformational manifold, known as the intrinsic dimension (ID). We present MDIntrinsicDimension, an open-source Python package that estimates ID directly from MD trajectories by combining rotation- and translation-invariant molecular projections (e.g., backbone dihedrals and inter-residue distances) with state-of-the-art estimators. The package provides three complementary analysis modes: whole-molecule ID; sliding windows along the sequence; and per-secondary-structure elements. It computes both overall ID (a single summary value) and instantaneous, time-resolved ID that can reveal transitions and heterogeneity over time. We illustrate the approach on fast folding-unfolding trajectories from the DESRES dataset, demonstrating that ID complements conventional geometric descriptors by highlighting spatially localized flexibility and differences across structural segments.
