Table of Contents
Fetching ...

Concept-Driven Deep Learning for Enhanced Protein-Specific Molecular Generation

Taojie Kuang, Qianli Ma, Athanasios V. Vasilakos, Yu Wang, Qiang, Cheng, Zhixiang Ren

TL;DR

The paper tackles the limited real-world applicability of atom-based and fragment-based molecular generation by introducing a two-stage, fragment-based framework that jointly considers protein subpocket interactions and geometric complementarity. A concept-based arm-sampling model selects ligand arms using interaction forces and subpocket geometry, followed by an $E(3)$-equivariant diffusion model that generates scaffolds to connect these arms within the binding pocket. The approach yields improvements in drug-likeness ($$QED$$) by $4\%$ and synthetic feasibility by $6\%$, while achieving strong binding affinity and enhanced interpretability through explicit interaction data. This framework advances structure-based drug design by delivering synthetically feasible, highly affine ligands with interpretable design rationales, and suggests directions for handling small subpockets and multi-objective optimization.

Abstract

In recent years, deep learning techniques have made significant strides in molecular generation for specific targets, driving advancements in drug discovery. However, existing molecular generation methods present significant limitations: those operating at the atomic level often lack synthetic feasibility, drug-likeness, and interpretability, while fragment-based approaches frequently overlook comprehensive factors that influence protein-molecule interactions. To address these challenges, we propose a novel fragment-based molecular generation framework tailored for specific proteins. Our method begins by constructing a protein subpocket and molecular arm concept-based neural network, which systematically integrates interaction force information and geometric complementarity to sample molecular arms for specific protein subpockets. Subsequently, we introduce a diffusion model to generate molecular backbones that connect these arms, ensuring structural integrity and chemical diversity. Our approach significantly improves synthetic feasibility and binding affinity, with a 4% increase in drug-likeness and a 6% improvement in synthetic feasibility. Furthermore, by integrating explicit interaction data through a concept-based model, our framework enhances interpretability, offering valuable insights into the molecular design process.

Concept-Driven Deep Learning for Enhanced Protein-Specific Molecular Generation

TL;DR

The paper tackles the limited real-world applicability of atom-based and fragment-based molecular generation by introducing a two-stage, fragment-based framework that jointly considers protein subpocket interactions and geometric complementarity. A concept-based arm-sampling model selects ligand arms using interaction forces and subpocket geometry, followed by an -equivariant diffusion model that generates scaffolds to connect these arms within the binding pocket. The approach yields improvements in drug-likeness () by and synthetic feasibility by , while achieving strong binding affinity and enhanced interpretability through explicit interaction data. This framework advances structure-based drug design by delivering synthetically feasible, highly affine ligands with interpretable design rationales, and suggests directions for handling small subpockets and multi-objective optimization.

Abstract

In recent years, deep learning techniques have made significant strides in molecular generation for specific targets, driving advancements in drug discovery. However, existing molecular generation methods present significant limitations: those operating at the atomic level often lack synthetic feasibility, drug-likeness, and interpretability, while fragment-based approaches frequently overlook comprehensive factors that influence protein-molecule interactions. To address these challenges, we propose a novel fragment-based molecular generation framework tailored for specific proteins. Our method begins by constructing a protein subpocket and molecular arm concept-based neural network, which systematically integrates interaction force information and geometric complementarity to sample molecular arms for specific protein subpockets. Subsequently, we introduce a diffusion model to generate molecular backbones that connect these arms, ensuring structural integrity and chemical diversity. Our approach significantly improves synthetic feasibility and binding affinity, with a 4% increase in drug-likeness and a 6% improvement in synthetic feasibility. Furthermore, by integrating explicit interaction data through a concept-based model, our framework enhances interpretability, offering valuable insights into the molecular design process.

Paper Structure

This paper contains 14 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The influence factors on the affinity between the protein sub-pocket and the ligand arm. a. The binding affinity between the protein pocket and the ligand can be divided into two main parts: the binding between the ligand arm and the protein sub-pocket, and the overall binding between the molecule and the protein. b. The affinity between the ligand arm and the protein sub-pocket is influenced by two main factors: spatial adaptation and non-covalent bonds. Spatial adaptation refers to how well the ligand arm fits into the protein sub-pocket, while non-covalent bonds include interactions like hydrogen bonds, hydrophobic forces, $\pi-\pi$ stacking, and salt bridges. These factors collectively determine the overall binding strength.
  • Figure 2: The overview of our method. a. Two-stage process for ligand molecular generation. First, ligand arms are sampled based on their compatibility with the target protein subpocket. Then, the scaffold is generated by linking the chosen arms, forming a complete molecular structure that fits the protein pocket. b. Ligand arm sampling. The first stage involves arm sampling, where a concept-based neural network identifies molecular arms that fit the subpockets of a target protein based on their interaction forces and geometric complementarity. Spatial adaptation and force interaction information are used to guide the selection of the most suitable arms for the protein subpocket. c. Scaffold generation. The second stage uses a diffusion model to generate molecular backbones that connect the selected arms, ensuring the ligands are structurally intact, diverse, and compatible with the protein binding pocket. The forward and reverse diffusion processes, combined with protein and ligand encoders, refine the scaffold structure, incorporating atom and bond types of arm as prior knowledge.
  • Figure 3: Comparison of molecular binding affinity across different methods. This figure compares the molecular binding affinities of ligands generated by three different methods: the reference molecules, those produced by DecompDiffguan2024decompdiff, and those generated by our proposed approach. Our method consistently outperforms DecompDiffguan2024decompdiff in terms of binding affinity, drug-likeness (QED), and synthetic accessibility (SA). However, in cases where the protein sub-pocket is too small to accommodate a sufficient number of ligand arm atoms, the binding affinity tends to decrease, suggesting that our method’s performance could be further refined in such scenarios.
  • Figure 4: Comparison of molecular binding affinity across different arm from concepted-based model. This figure illustrates the impact of ligand arm selection on molecular binding affinity. The top example shows optimal ligand arm choices, resulting in higher Vina scores, QED, and SA values, indicating strong binding affinity and synthetic feasibility. The bottom example demonstrates less optimal arm selection, leading to lower binding affinity and poorer drug-likeness, highlighting the importance of selecting high-quality ligand arms in our concept-based model.