Table of Contents
Fetching ...

TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design

Xiuyuan Hu, Guoqing Liu, Can Chen, Yang Zhao, Hao Zhang, Xue Liu

TL;DR

TransDiffSBDD addresses the dual challenges of multi-modal structure-based drug design by integrating an autoregressive transformer for discrete graph information with a diffusion model for continuous 3D coordinates. It introduces hybrid-modal sequences that preserve causality between protein pockets and ligand structures, and an integrated GPT-like backbone with a diffusion head to generate SMILES and coordinates in a coherent autoregressive workflow. Training combines a joint token- and coordinate-focused loss with reinforcement learning fine-tuning and data augmentation, achieving state-of-the-art results on CrossDocked2020 with high multi-property optimization (MPO) performance. The approach demonstrates strong practical potential for drug design by leveraging causal-aware multi-modal generation, though it faces limitations related to data scarcity for 3D equilibrium distributions, time-resolved binding dynamics, and interpretability. Overall, TransDiffSBDD offers a principled framework for principled multi-modal SBDD with notable gains in docking performance and ligand diversity.

Abstract

Structure-based drug design (SBDD) is a critical task in drug discovery, requiring the generation of molecular information across two distinct modalities: discrete molecular graphs and continuous 3D coordinates. However, existing SBDD methods often overlook two key challenges: (1) the multi-modal nature of this task and (2) the causal relationship between these modalities, limiting their plausibility and performance. To address both challenges, we propose TransDiffSBDD, an integrated framework combining autoregressive transformers and diffusion models for SBDD. Specifically, the autoregressive transformer models discrete molecular information, while the diffusion model samples continuous distributions, effectively resolving the first challenge. To address the second challenge, we design a hybrid-modal sequence for protein-ligand complexes that explicitly respects the causality between modalities. Experiments on the CrossDocked2020 benchmark demonstrate that TransDiffSBDD outperforms existing baselines.

TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design

TL;DR

TransDiffSBDD addresses the dual challenges of multi-modal structure-based drug design by integrating an autoregressive transformer for discrete graph information with a diffusion model for continuous 3D coordinates. It introduces hybrid-modal sequences that preserve causality between protein pockets and ligand structures, and an integrated GPT-like backbone with a diffusion head to generate SMILES and coordinates in a coherent autoregressive workflow. Training combines a joint token- and coordinate-focused loss with reinforcement learning fine-tuning and data augmentation, achieving state-of-the-art results on CrossDocked2020 with high multi-property optimization (MPO) performance. The approach demonstrates strong practical potential for drug design by leveraging causal-aware multi-modal generation, though it faces limitations related to data scarcity for 3D equilibrium distributions, time-resolved binding dynamics, and interpretability. Overall, TransDiffSBDD offers a principled framework for principled multi-modal SBDD with notable gains in docking performance and ligand diversity.

Abstract

Structure-based drug design (SBDD) is a critical task in drug discovery, requiring the generation of molecular information across two distinct modalities: discrete molecular graphs and continuous 3D coordinates. However, existing SBDD methods often overlook two key challenges: (1) the multi-modal nature of this task and (2) the causal relationship between these modalities, limiting their plausibility and performance. To address both challenges, we propose TransDiffSBDD, an integrated framework combining autoregressive transformers and diffusion models for SBDD. Specifically, the autoregressive transformer models discrete molecular information, while the diffusion model samples continuous distributions, effectively resolving the first challenge. To address the second challenge, we design a hybrid-modal sequence for protein-ligand complexes that explicitly respects the causality between modalities. Experiments on the CrossDocked2020 benchmark demonstrate that TransDiffSBDD outperforms existing baselines.

Paper Structure

This paper contains 23 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the TransDiffSBDD framework. (a) This figure illustrates the hybrid-modal sequence for a protein-ligand complex, where discrete tokens are marked in yellow, continuous 3D coordinates are marked in blue, and the connections indicate the correspondence between atoms and their coordinates. The sequence consists of alternating atomic and coordinate information for the protein structure, followed by the ligand's discrete graph information represented as SMILES and its 3D coordinate information. (b) This diagram depicts the integrated model of autoregressive transformer and diffusion, where components dedicated to discrete tokens are marked in yellow, components specialized for continuous 3D coordinates are marked in blue, and shared modeling components are marked in green. Specifically, when the output is 3D coordinates, the output vector from the transformer layers serves as conditional information for the diffusion MLP.
  • Figure 2: Case study on protein target 1R1H: structures and binding poses of ligands.
  • Figure 3: Case study on protein target 4PXZ: structures and binding poses of ligands.