Table of Contents
Fetching ...

HybridLinker: Topology-Guided Posterior Sampling for Enhanced Diversity and Validity in 3D Molecular Linker Generation

Minyeong Hwang, Ziseok Lee, Kwang-Soo Kim, Kyungsu Kim, Eunho Yang

TL;DR

HybridLinker tackles the persistent diversity- validity trade-off in 3D linker generation by integrating pretrained point cloud-free and point cloud-aware models in a zero-shot two-stage pipeline. The core innovation, LinkerDPS, performs diffusion posterior sampling across topology and point cloud spaces using an energy-based cross-domain likelihood and an inpainting-based conditional score estimator, enabling topology-guided refinement of surrogates. Experiments on ZINC-derived fragment pairs demonstrate that HybridLinker achieves superior diversity and validity, and it also enhances drug-likeness optimization compared to strong baselines, suggesting strong utility as a foundational model for fragment-based drug design. By bridging topology and geometry without additional training, LinkerDPS broadens the applicability of diffusion-based molecular design to challenging cross-domain tasks and large-molecule generation.

Abstract

Linker generation is critical in drug discovery applications such as lead optimization and PROTAC design, where molecular fragments are assembled into diverse drug candidates via molecular linker. Existing methods fall into point cloud-free and point cloud-aware categories based on their use of fragments' 3D poses alongside their topologies in sampling the linker's topology. Point cloud-free models prioritize sample diversity but suffer from lower validity due to overlooking fragments' spatial constraints, while point cloud-aware models ensure higher validity but restrict diversity by enforcing strict spatial constraints. To overcome these trade-offs without additional training, we propose HybridLinker, a framework that enhances point cloud-aware inference by providing diverse bonding topologies from a pretrained point cloud-free model as guidance. At its core, we propose LinkerDPS, the first diffusion posterior sampling (DPS) method operating across point cloud-free and point cloud-aware spaces, bridging molecular topology with 3D point clouds via an energy-inspired function. By transferring the diverse sampling distribution of point cloud-free models into the point cloud-aware distribution, HybridLinker significantly surpasses baselines, improving both validity and diversity in foundational molecular design and applied drug optimization tasks, establishing a new DPS framework in the molecular domains beyond imaging.

HybridLinker: Topology-Guided Posterior Sampling for Enhanced Diversity and Validity in 3D Molecular Linker Generation

TL;DR

HybridLinker tackles the persistent diversity- validity trade-off in 3D linker generation by integrating pretrained point cloud-free and point cloud-aware models in a zero-shot two-stage pipeline. The core innovation, LinkerDPS, performs diffusion posterior sampling across topology and point cloud spaces using an energy-based cross-domain likelihood and an inpainting-based conditional score estimator, enabling topology-guided refinement of surrogates. Experiments on ZINC-derived fragment pairs demonstrate that HybridLinker achieves superior diversity and validity, and it also enhances drug-likeness optimization compared to strong baselines, suggesting strong utility as a foundational model for fragment-based drug design. By bridging topology and geometry without additional training, LinkerDPS broadens the applicability of diffusion-based molecular design to challenging cross-domain tasks and large-molecule generation.

Abstract

Linker generation is critical in drug discovery applications such as lead optimization and PROTAC design, where molecular fragments are assembled into diverse drug candidates via molecular linker. Existing methods fall into point cloud-free and point cloud-aware categories based on their use of fragments' 3D poses alongside their topologies in sampling the linker's topology. Point cloud-free models prioritize sample diversity but suffer from lower validity due to overlooking fragments' spatial constraints, while point cloud-aware models ensure higher validity but restrict diversity by enforcing strict spatial constraints. To overcome these trade-offs without additional training, we propose HybridLinker, a framework that enhances point cloud-aware inference by providing diverse bonding topologies from a pretrained point cloud-free model as guidance. At its core, we propose LinkerDPS, the first diffusion posterior sampling (DPS) method operating across point cloud-free and point cloud-aware spaces, bridging molecular topology with 3D point clouds via an energy-inspired function. By transferring the diverse sampling distribution of point cloud-free models into the point cloud-aware distribution, HybridLinker significantly surpasses baselines, improving both validity and diversity in foundational molecular design and applied drug optimization tasks, establishing a new DPS framework in the molecular domains beyond imaging.

Paper Structure

This paper contains 55 sections, 4 theorems, 52 equations, 6 figures, 7 tables, 4 algorithms.

Key Result

Theorem F.3

For the measurement model defined in Definition def:measurement_model with $\epsilon\sim\mathcal{N}(0,\sigma^2I)$, we have with $\hat{x}=\mathbb{E}_{x_0\mid x_t}[x_0]$ where the approximation error can be quantified with the Jensen gap, which is upper bounded by where $\|\nabla_x\mathcal{A}(x)\| := \max_{x}\|\nabla_x\mathcal{A}(x)\|$ and $m_1:=\int \|x_0 - \hat{x}\|p(x_0|x_t)\,dx_0$.

Figures (6)

  • Figure 1: (a) Qualitative comparison of our method and existing pipelines on diversity of valid molecule samples measured on five metrics. (b) Trade-off between diversity and validity in baseline models and our hybrid approach overcoming these limitations.
  • Figure 2: Comparison of generation pipelines for PC-Free and PC-Aware models. HybridLinker is designed to leverage the strengths of both approaches, inheriting high diversity from the PC-Free model and high validity from the PC-Aware model.
  • Figure 3: Problem setup for linker generation with its context in drug discovery, illustrating the fragments $G_1$ and $G_2$ embedded in a candidate molecule $G'$. $N_\text{ref}$ is the size of reference molecule.
  • Figure 4: Conceptual comparison of sampling distributions in Diffusion-based PC-Aware, PC-Free, and HybridLinker models. (a) Diffusion-based PC-Aware models focus on validity but suffer from low diversity due to spatial constraints. (b) PC-Free models explore a broad molecular space but often generate invalid molecules. (c) HybridLinker leverages LinkerDPS to balance diversity and validity, enhancing exploration while maintaining correctness.
  • Figure 5: Extended definition of molecular validity, accounting for its point cloud feature as well as topological valence rule via utilizing strain energy.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition B.1: 3D Molecular Graph
  • Definition F.1: Jensen Gap gao2017bounds
  • Definition F.2
  • Theorem F.3
  • Theorem F.4
  • proof : proof of \ref{['thm:linkerdpsapproximation']}
  • Lemma F.5
  • proof : proof of Lemma \ref{['lemma:1']}
  • Proposition F.6: Jensen gap upper boundgao2017bounds