Table of Contents
Fetching ...

Geometric Deep Learning for Structure-Based Drug Design: A Survey

Zaixi Zhang, Jiaxian Yan, Yining Huang, Qi Liu, Enhong Chen, Mengdi Wang, Marinka Zitnik

TL;DR

This paper systematically reviews the state of the art in geometric deep learning for SBDD, and provides an in-depth review of key tasks, including binding site prediction, binding pose generation, de novo molecule generation, linker design, protein pocket generation, and binding affinity prediction.

Abstract

Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates. Traditional approaches, rooted in physicochemical modeling and domain expertise, are often resource-intensive. Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, alongside breakthroughs in accurate protein structure predictions from tools like AlphaFold, have significantly propelled the field forward. This paper systematically reviews the state-of-the-art in geometric deep learning for SBDD. We begin by outlining foundational tasks in SBDD, discussing prevalent 3D protein representations, and highlighting representative predictive and generative models. Next, we provide an in-depth review of key tasks, including binding site prediction, binding pose generation, de novo molecule generation, linker design, protein pocket generation, and binding affinity prediction. For each task, we present formal problem definitions, key methods, datasets, evaluation metrics, and performance benchmarks. Lastly, we explore current challenges and future opportunities in SBDD. Challenges include oversimplified problem formulations, limited out-of-distribution generalization, biosecurity concerns related to the misuse of structural data, insufficient evaluation metrics and large-scale benchmarks, and the need for experimental validation and enhanced model interpretability. Opportunities lie in leveraging multimodal datasets, integrating domain knowledge, developing comprehensive benchmarks, establishing criteria aligned with clinical outcomes, and designing foundation models to expand the scope of design tasks. We also curate \url{https://github.com/zaixizhang/Awesome-SBDD}, reflecting ongoing contributions and new datasets in SBDD.

Geometric Deep Learning for Structure-Based Drug Design: A Survey

TL;DR

This paper systematically reviews the state of the art in geometric deep learning for SBDD, and provides an in-depth review of key tasks, including binding site prediction, binding pose generation, de novo molecule generation, linker design, protein pocket generation, and binding affinity prediction.

Abstract

Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates. Traditional approaches, rooted in physicochemical modeling and domain expertise, are often resource-intensive. Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, alongside breakthroughs in accurate protein structure predictions from tools like AlphaFold, have significantly propelled the field forward. This paper systematically reviews the state-of-the-art in geometric deep learning for SBDD. We begin by outlining foundational tasks in SBDD, discussing prevalent 3D protein representations, and highlighting representative predictive and generative models. Next, we provide an in-depth review of key tasks, including binding site prediction, binding pose generation, de novo molecule generation, linker design, protein pocket generation, and binding affinity prediction. For each task, we present formal problem definitions, key methods, datasets, evaluation metrics, and performance benchmarks. Lastly, we explore current challenges and future opportunities in SBDD. Challenges include oversimplified problem formulations, limited out-of-distribution generalization, biosecurity concerns related to the misuse of structural data, insufficient evaluation metrics and large-scale benchmarks, and the need for experimental validation and enhanced model interpretability. Opportunities lie in leveraging multimodal datasets, integrating domain knowledge, developing comprehensive benchmarks, establishing criteria aligned with clinical outcomes, and designing foundation models to expand the scope of design tasks. We also curate \url{https://github.com/zaixizhang/Awesome-SBDD}, reflecting ongoing contributions and new datasets in SBDD.
Paper Structure (65 sections, 15 equations, 10 figures, 10 tables)

This paper contains 65 sections, 15 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Structure-based drug design tasks discussed in this survey: (a) binding site prediction identifies areas of the protein structure that can act as binding sites for ligands (Section \ref{['sec:binding_site']}); (b) binding pose generation or protein-ligand docking focus on predicting the binding conformations of the protein-ligand complex (Section \ref{['sec:binding_pose']}); (c) de novo ligand generation designs binding ligands from scratch with the structural information of the target protein (Section \ref{['sec:ligand_generation']}); (d) linker design combines disconnected molecular fragments into a combined ligand molecule conditioned on the target protein (Section \ref{['sec:linker_design']}); (e) protein pocket generation redesigns the protein pocket (including sequence and structure) given binding ligand (Section \ref{['sec:pocket_generation']}); (f) binding affinity prediction predicts the affinity between a protein and a ligand given their binding structure (Section \ref{['sec:binding_affinity']}).
  • Figure 2: 3D representations of proteins used for geometric deep learning: (a) 3D grid, (b) 3D surface, and (c) 3D graph, illustrated for PDB ID 2avd.
  • Figure 3: Representing molecules as (a) 2D graphs and (b) 3D graphs.
  • Figure 4: Overview of MaSIF gainza2020deciphering and dMaSIF sverrisson2021fast for binding site prediction. They have similar steps, and each step's average running time per protein is marked. MaSIF precomputes steps in a-c, whereas dMaSIF computes them on the fly and is 600 times faster than MaSIF.
  • Figure 5: Overview of DiffDock corso2022diffdock for binding pose prediction. The model takes as input the separate ligand and protein structures. Randomly sampled initial poses are denoised via a reverse diffusion process over translational, rotational, and torsional degrees of freedom. A trained confidence model ranks the sampled poses to produce a final prediction and confidence score.
  • ...and 5 more figures