Table of Contents
Fetching ...

Drug Discovery with Dynamic Goal-aware Fragments

Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang

TL;DR

Drug discovery in vast chemical space benefits from targeted fragment-based approaches. GEAM combines Fragment-wise Graph Information Bottleneck (FGIB) for goal-aware fragment extraction, Soft Actor-Critic (SAC) for fragment assembly, and a Graph Genetic Algorithm (GA) for fragment modification, with a dynamic vocabulary that evolves as generation proceeds. The method optimizes multi-objective drug-like properties by leveraging $Y(G)=\widehat{DS}(G)\cdot QED(G)\cdot\widehat{SA}(G)$ and the information-bottleneck objective $-I(Z,Y;\theta)+\beta I(Z,G;\theta)$ to select informative fragments. Empirically, GEAM outperforms state-of-the-art baselines on docking/novelty metrics and PMO benchmarks, while the dynamic vocabulary update boosts novelty and diversity without compromising performance. Overall, GEAM provides a scalable, interpretable, and effective framework for generating high-quality, diverse drug candidates.

Abstract

Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments contributing to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks. Our code is available at https://github.com/SeulLee05/GEAM.

Drug Discovery with Dynamic Goal-aware Fragments

TL;DR

Drug discovery in vast chemical space benefits from targeted fragment-based approaches. GEAM combines Fragment-wise Graph Information Bottleneck (FGIB) for goal-aware fragment extraction, Soft Actor-Critic (SAC) for fragment assembly, and a Graph Genetic Algorithm (GA) for fragment modification, with a dynamic vocabulary that evolves as generation proceeds. The method optimizes multi-objective drug-like properties by leveraging and the information-bottleneck objective to select informative fragments. Empirically, GEAM outperforms state-of-the-art baselines on docking/novelty metrics and PMO benchmarks, while the dynamic vocabulary update boosts novelty and diversity without compromising performance. Overall, GEAM provides a scalable, interpretable, and effective framework for generating high-quality, diverse drug candidates.

Abstract

Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments contributing to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks. Our code is available at https://github.com/SeulLee05/GEAM.
Paper Structure (38 sections, 1 theorem, 28 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 38 sections, 1 theorem, 28 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Proposition 3.1

The probability of $\pi$ failing to generate at least one optimal $G \in\varphi({\mathcal{S}})$ is at most $p$ where $p= (1+N\log(\frac{|{\mathcal{S}}|^{T}}{|{\mathcal{S}}|^{T}-\varphi({\mathcal{S}})}))^{-1}$ if $|{\mathcal{S}}|^{T}\neq\varphi({\mathcal{S}})$ and $p=0$ if $|{\mathcal{S}}|^{T}=\varph

Figures (7)

  • Figure 1: (a) The architecture of FGIB. Using the GIB theory, FGIB aims to identify the important subgraphs that contribute much to the target chemical property in the given molecular graphs. The trained FGIB is then used to extract fragments in a molecular dataset in the goal-aware manner. (b) Performance comparison of GEAM and other FBDD methods on the jak2 ligand generation task.
  • Figure 2: The overall framework of GEAM. GEAM consists of three modules, FGIB, SAC, and GA for fragment extraction, fragment assembly, and fragment modification, respectively.
  • Figure 3: (a-c) Ablation studies on FGIB, SAC and GA on the ligand generation task with the target protein jak2 and (d) the PLIP image showing interactions between an example molecule and jak2.
  • Figure 4: The generation progress of GEAM and GEAM-static on the ligand generation task against jak2.
  • Figure 5: The examples of the generated novel hits by GEAM. The values of docking score (kcal/mol), QED, SA, and the maximum similarity with the training molecules are provided at the bottom of each molecule.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • proof
  • proof