Table of Contents
Fetching ...

Temporal Subspace Clustering for Molecular Dynamics Data

Anna Beer, Martin Heinrigs, Claudia Plant, Ira Assent

TL;DR

This work addresses clustering in high-dimensional molecular dynamics data by introducing MOSCITO, a one-step temporal subspace clustering method that learns a dictionary and coding with temporal regularization to exploit sequential relationships. Features tailored to MD data are combined with a temporal Laplacian in a joint objective, solved via ADMM to yield an affinity graph whose spectral clustering produces clusters that align with MSM states. Across 60 trajectories and four proteins, MOSCITO demonstrates state-of-the-art or competitive MSM quality (via VAMP-2) and superior trajectory segmentation for small cluster counts, while offering favorable runtime compared to existing MD-specific subspace methods. The approach leverages MD-specific features, temporal coherence, and a holistic optimization to produce interpretable state representations without post-processing, with potential extensions to multi-view feature integration.

Abstract

We introduce MOSCITO (MOlecular Dynamics Subspace Clustering with Temporal Observance), a subspace clustering for molecular dynamics data. MOSCITO groups those timesteps of a molecular dynamics trajectory together into clusters in which the molecule has similar conformations. In contrast to state-of-the-art methods, MOSCITO takes advantage of sequential relationships found in time series data. Unlike existing work, MOSCITO does not need a two-step procedure with tedious post-processing, but directly models essential properties of the data. Interpreting clusters as Markov states allows us to evaluate the clustering performance based on the resulting Markov state models. In experiments on 60 trajectories and 4 different proteins, we show that the performance of MOSCITO achieves state-of-the-art performance in a novel single-step method. Moreover, by modeling temporal aspects, MOSCITO obtains better segmentation of trajectories, especially for small numbers of clusters.

Temporal Subspace Clustering for Molecular Dynamics Data

TL;DR

This work addresses clustering in high-dimensional molecular dynamics data by introducing MOSCITO, a one-step temporal subspace clustering method that learns a dictionary and coding with temporal regularization to exploit sequential relationships. Features tailored to MD data are combined with a temporal Laplacian in a joint objective, solved via ADMM to yield an affinity graph whose spectral clustering produces clusters that align with MSM states. Across 60 trajectories and four proteins, MOSCITO demonstrates state-of-the-art or competitive MSM quality (via VAMP-2) and superior trajectory segmentation for small cluster counts, while offering favorable runtime compared to existing MD-specific subspace methods. The approach leverages MD-specific features, temporal coherence, and a holistic optimization to produce interpretable state representations without post-processing, with potential extensions to multi-view feature integration.

Abstract

We introduce MOSCITO (MOlecular Dynamics Subspace Clustering with Temporal Observance), a subspace clustering for molecular dynamics data. MOSCITO groups those timesteps of a molecular dynamics trajectory together into clusters in which the molecule has similar conformations. In contrast to state-of-the-art methods, MOSCITO takes advantage of sequential relationships found in time series data. Unlike existing work, MOSCITO does not need a two-step procedure with tedious post-processing, but directly models essential properties of the data. Interpreting clusters as Markov states allows us to evaluate the clustering performance based on the resulting Markov state models. In experiments on 60 trajectories and 4 different proteins, we show that the performance of MOSCITO achieves state-of-the-art performance in a novel single-step method. Moreover, by modeling temporal aspects, MOSCITO obtains better segmentation of trajectories, especially for small numbers of clusters.
Paper Structure (20 sections, 14 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 14 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: MOSCITO derives MD features from input trajectories. Temporal regularization provides the means for effective subspace clustering with temporal relations.
  • Figure 2: Comparison of the VAMP-scores for different dictionary sizes. For each combination of MSM lag and cluster count, the VAMP2-Score for different dictionary sizes and the number of sequential neighbors are compared. For better differentiation, the VAMP2-score is plotted starting at 3.
  • Figure 3: VAMP-scores for varying number of sequential neighbors and dictionary sizes over pairings of MSM lag and cluster count. For visibility, VAMP2-axis starts at 3. Best results for 3 to 5 sequential neighbors.
  • Figure 4: Varying number of sequential neighbors for 2F4K protein clustered into 5 clusters. Time steps are represented along the x-axis, clusters are implied by color.
  • Figure 5: Heatmap of VAMP-scores (the higher/lighter, the better; scale from 3 to 5 for visibility); best results for values around 0.01 for $\lambda_1$ and around 15 for $\lambda_2$.
  • ...and 8 more figures