Table of Contents
Fetching ...

Auto-WHATMD : Automated Wasserstein-based High-dimensional feature extraction Analysis of Trajectories from Molecular Dynamics

Sosuke Asano, Ikki Yasuda, Katsuhiro Endo, Yoshinori Hirano, Kenji Yasuoka

Abstract

Comparing multiple protein systems with variation such as different binding ligands or mutations, and understanding their effects is one of the objectives in molecular dynamics simulations. Representation of these systems by a few features enables quantitative comparison. However, because molecular dynamics simulation trajectories are high-dimensional spatiotemporal data, selection of key features relies on domain expertise, sometimes introducing arbitrary assumptions. Here, we present an approach that uses the optimal transport distance to compare high-dimensional trajectory data, and employs simulated annealing to identify the residues that best distinguish multiple systems. We term this algorithm auto-WHATMD (automated Wasserstein-based High-dimensional feature extraction Analysis for Trajectories of Molecular Dynamics). We applied auto-WHATMD to multiple protein-ligand systems of bromodomain 4 with different ligands, identifying the most discriminative residues in the loop region. Moreover, even a few selected residues were sufficient to capture the correlation with ligand-binding affinities, indicating that auto-WHATMD effectively prioritizes the most informative residues. Our approach can be used to efficiently determine key residues and design features for multiple analogous systems.

Auto-WHATMD : Automated Wasserstein-based High-dimensional feature extraction Analysis of Trajectories from Molecular Dynamics

Abstract

Comparing multiple protein systems with variation such as different binding ligands or mutations, and understanding their effects is one of the objectives in molecular dynamics simulations. Representation of these systems by a few features enables quantitative comparison. However, because molecular dynamics simulation trajectories are high-dimensional spatiotemporal data, selection of key features relies on domain expertise, sometimes introducing arbitrary assumptions. Here, we present an approach that uses the optimal transport distance to compare high-dimensional trajectory data, and employs simulated annealing to identify the residues that best distinguish multiple systems. We term this algorithm auto-WHATMD (automated Wasserstein-based High-dimensional feature extraction Analysis for Trajectories of Molecular Dynamics). We applied auto-WHATMD to multiple protein-ligand systems of bromodomain 4 with different ligands, identifying the most discriminative residues in the loop region. Moreover, even a few selected residues were sufficient to capture the correlation with ligand-binding affinities, indicating that auto-WHATMD effectively prioritizes the most informative residues. Our approach can be used to efficiently determine key residues and design features for multiple analogous systems.
Paper Structure (15 sections, 8 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 8 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Schematic overview of auto-WHATMD. (a) Multiple systems are prepared and important residues are automatically selected from the entire protein. Trajectories of the selected residues are represented as high-dimensional data distributions. Differences between systems are quantified using the Wasserstein distance, and the resulting distance matrix is embedded into a low-dimensional space to analyze relationships between collective variables and system properties. (b) Important residues selected from the protein are indicated by residue masking. (c) Optimization for automatic residue selection. System differences computed by a deep neural network are used to optimize the mask vector.
  • Figure 2: Chemical structure of ligand L1-10
  • Figure 3: Spatial distribution of residues around the BRD4 binding site. The upper and lower panels show two views rotated by 90°. Residues in the small subset (14 residues) are labeled, and those additionally included in the extended binding-site subset are marked with an asterisk (*).
  • Figure 4: Automatic selection from a small subset in BRD4 systems using auto-WHATMD. (a) Representative process for optimizing residue selection, in which four residues were selected from a subset of 14 residues. Selected residues are shown in red. (b) Residue selection results from ten independent optimization experiments. Selected and non-selected residues are colored in black and white, respectively. (c) Matrix of pairwise Wasserstein distances computed between all systems using the optimized binary mask. Indices correspond to the ligand-free protein (apo) and the 10 ligand-bound systems (L1--L10). (d) Principal component representation of the low-dimensional embedding derived from the Wasserstein distance matrix. Each point represents a system labeled by ligand number. The color indicates the computational ligand-binding free energy $\Delta G_{\mathrm{MD}}$ reported in Ref. aldeghi2016.
  • Figure 5: Consistency of selected residues via mask optimization for BRD4 systems. (a) Results for the small subset of 14 residues, varying the number of selected residues from 3 to 14. Residues marked with a crossed line were not included in the subset. Selected and non-selected residues are colored in black and white, respectively. (b) Results for the extended subset of 19 residues around the binding site, varying the number of selected residues from 3 to 19.
  • ...and 5 more figures