Table of Contents
Fetching ...

Accelerated Hydration Site Localization and Thermodynamic Profiling

Florian B. Hinz, Matthew R. Masters, Julia N. Kieu, Amr H. Mahmoud, Markus A. Lill

Abstract

Water plays a fundamental role in the structure and function of proteins and other biomolecules. The thermodynamic profile of water molecules surrounding a protein are critical for ligand binding and recognition. Therefore, identifying the location and thermodynamic behavior of relevant water molecules is important for generating and optimizing lead compounds for affinity and selectivity to a given target. Computational methods have been developed to identify these hydration sites, but are largely limited to simplified models that fail to capture multi-body interactions, or dynamics-based methods that rely on extensive sampling. Here we present a method for fast and accurate localization and thermodynamic profiling of hydration sites for protein structures. The method is based on a geometric deep neural network trained on a large, novel dataset of explicit water molecular dynamics simulations. We confirm the accuracy and robustness of our model on experimental data and demonstrate it's utility on several case studies.

Accelerated Hydration Site Localization and Thermodynamic Profiling

Abstract

Water plays a fundamental role in the structure and function of proteins and other biomolecules. The thermodynamic profile of water molecules surrounding a protein are critical for ligand binding and recognition. Therefore, identifying the location and thermodynamic behavior of relevant water molecules is important for generating and optimizing lead compounds for affinity and selectivity to a given target. Computational methods have been developed to identify these hydration sites, but are largely limited to simplified models that fail to capture multi-body interactions, or dynamics-based methods that rely on extensive sampling. Here we present a method for fast and accurate localization and thermodynamic profiling of hydration sites for protein structures. The method is based on a geometric deep neural network trained on a large, novel dataset of explicit water molecular dynamics simulations. We confirm the accuracy and robustness of our model on experimental data and demonstrate it's utility on several case studies.

Paper Structure

This paper contains 25 sections, 12 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Schematics displaying the coordinates prediction of hydration sites. A: Initially, we position a water prediction (light blue spheres) at every atom exceeding a certain cutoff of solvent exposure (SASA greater than $0.1$). An equivariant attention layer is applied to the distance-based graph topology to perturb the water predictions. B, C: The distance based graph topology is updated by now also considering water predictions as graph nodes. An equivariant attention layer perurbs the water predictions. D: After the application of five layers, predictions of low certainty are filtered out and a clustering algorithm is applied to end up at the final water predictions (E).
  • Figure 2: Schematics displaying the prediction of entropy and enthalpy. A: The input consists of coordinates and feature vectors for protein atoms and hydration sites. The edges are constructed based on distances. B: A graph attention network is applied and outputs a modified feature vector per graph node. C: A feed forward network is applied to the feature vectors of the water nodes. The output consists of the enthalpy and entropy values.
  • Figure 3: Hexbin plot showing the correlation between the enthalpy predictions and the enthalpy values obtained from WATsite for the test set.
  • Figure 4: Hexbin plot showing the correlation between the entropy predictions and the entropy values obtained from WATsite for the test set.
  • Figure 5: Buried water network important for the activity of disulphide catalyst DsbA. We evaluated the wildtype (PDB: 5QO9) as well as two mutants known to disrupt the conserved water network, E24A (PDB: 8EQR) and E37A (PDB: 8EQQ). Crystal waters are shown from all superimposed chains in green. Hydration site predictions are shown in cyan. Relevant distances are shown and labelled with their length in Angstrom. Predictions beyond the selected crystal hydration sites are not shown.
  • ...and 6 more figures