Table of Contents
Fetching ...

DeepConf: Machine Learning Conformer Reconstruction of Biomolecules from Scanning Tunneling Microscopy Images

Tim J. Seifert, Dhaneesh Kumar, Markus Etzkorn, Stephan Rauschenbach, Klaus Kern, Kelvin Anggara, Uta Schlickum

TL;DR

This work proposes a framework for the rapid generation of three-dimensional structures of glycans, peptides, and glycopeptides and their corresponding STM-like image simulations, based on state-of-the-art, machine-learning-accelerated Density Functional Theory (DFT).

Abstract

Improving the detailed understanding of the underlying properties and functions of biomolecules has recently attracted growing interest, enabled by the possibility of real-space imaging of single, intact macromolecules using Scanning Tunneling Microscopy (STM) in combination with electrospray ion beam deposition and soft landing. This combination provides key insights into biomolecular behavior, but it also imposes stringent requirements on rapid and reliable data analysis. A major limiting factor for applying machine learning to STM images is often the scarcity of training data, caused by the long acquisition times required for both experimental imaging and high-accuracy simulations. Here, we propose a framework for the rapid generation of three-dimensional structures of glycans, peptides, and glycopeptides and their corresponding STM-like image simulations, based on state-of-the-art, machine-learning-accelerated Density Functional Theory (DFT). We generate datasets for the polypeptide bradykinin and for a representative glycan molecule, and we train a conformer estimation model to predict a molecule's three-dimensional structure from an STM image. On synthetic data, our approach achieves high accuracy, with median atomic deviations below $2\,Å$ for peptides and below $4\,Å$ for glycans. Application to experimental data predominantly yields a precise, reliable, and visually convincing determination of the local positions of molecular subunits. The application to experimental data represents an important milestone towards a fully automated structural search pipeline for complex, biologically relevant systems imaged with STM.

DeepConf: Machine Learning Conformer Reconstruction of Biomolecules from Scanning Tunneling Microscopy Images

TL;DR

This work proposes a framework for the rapid generation of three-dimensional structures of glycans, peptides, and glycopeptides and their corresponding STM-like image simulations, based on state-of-the-art, machine-learning-accelerated Density Functional Theory (DFT).

Abstract

Improving the detailed understanding of the underlying properties and functions of biomolecules has recently attracted growing interest, enabled by the possibility of real-space imaging of single, intact macromolecules using Scanning Tunneling Microscopy (STM) in combination with electrospray ion beam deposition and soft landing. This combination provides key insights into biomolecular behavior, but it also imposes stringent requirements on rapid and reliable data analysis. A major limiting factor for applying machine learning to STM images is often the scarcity of training data, caused by the long acquisition times required for both experimental imaging and high-accuracy simulations. Here, we propose a framework for the rapid generation of three-dimensional structures of glycans, peptides, and glycopeptides and their corresponding STM-like image simulations, based on state-of-the-art, machine-learning-accelerated Density Functional Theory (DFT). We generate datasets for the polypeptide bradykinin and for a representative glycan molecule, and we train a conformer estimation model to predict a molecule's three-dimensional structure from an STM image. On synthetic data, our approach achieves high accuracy, with median atomic deviations below for peptides and below for glycans. Application to experimental data predominantly yields a precise, reliable, and visually convincing determination of the local positions of molecular subunits. The application to experimental data represents an important milestone towards a fully automated structural search pipeline for complex, biologically relevant systems imaged with STM.
Paper Structure (7 sections, 1 equation, 14 figures)

This paper contains 7 sections, 1 equation, 14 figures.

Figures (14)

  • Figure 1: Schematic Workflow of the image generation process. First, (1) a random molecular geometry is created by parsing the molecular sequence, attaching one amino acid or monosaccharide at a time, followed by structural and surface relaxation steps. Secondly, (2) the electronic density is estimated using emulated del_rio_deep_2023. With this (3), a image is extracted, created by convolving a random tip geometry with the electronic density. This image can be used to train the model to predict a molecular encoding, which can be reconstructed to retrieve the 3D molecular structure (4).
  • Figure 2: a) Bradykinin molecule used as an exemplary peptide. The chain is color coded according to the individual amino acids. Identical amino acids have the same color. The N- and C-termini are marked with an orange and a blue overlay respectively. b) Amine terminated glucose hexamer used as an exemplary glycan. The chain is color coded for each monosaccharide and the amine linker. The linker is marked with a blue overlay.
  • Figure 3: Results of conformer prediction model on synthetic and real peptide images. The overlay shows the flat two-dimensional projection of the predicted atomic positions. Additionally, top and side perspectives are shown. The amino acids are color coded and the arginine residues highlighted as in figure \ref{['fig: Structures']}. All images have a size of 5n m $\times$ 5n m.
  • Figure 4: Results of conformer prediction model on synthetic and real images of glycans. The overlay shows the flat two-dimensional projection of the predicted atomic positions. Additionally, top and side perspectives are shown. The individual monosaccharides, and the linker are color coded and highlighted as in figure \ref{['fig: Structures']}. All images have a size of 5n m $\times$ 5n m
  • Figure 5: Example images of the peptide Angiotensin II with sequence ASP-ARG-VAL-TYR-ILE-HIS-PRO-PHE. Figures show different synthetic -like images of the same structure, resulting from the identical predicted electronic density, as well as a three-dimensional top-down view of the generated conformationhanwell_avogadro_2012.
  • ...and 9 more figures