TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design
Xiuyuan Hu, Guoqing Liu, Can Chen, Yang Zhao, Hao Zhang, Xue Liu
TL;DR
TransDiffSBDD addresses the dual challenges of multi-modal structure-based drug design by integrating an autoregressive transformer for discrete graph information with a diffusion model for continuous 3D coordinates. It introduces hybrid-modal sequences that preserve causality between protein pockets and ligand structures, and an integrated GPT-like backbone with a diffusion head to generate SMILES and coordinates in a coherent autoregressive workflow. Training combines a joint token- and coordinate-focused loss with reinforcement learning fine-tuning and data augmentation, achieving state-of-the-art results on CrossDocked2020 with high multi-property optimization (MPO) performance. The approach demonstrates strong practical potential for drug design by leveraging causal-aware multi-modal generation, though it faces limitations related to data scarcity for 3D equilibrium distributions, time-resolved binding dynamics, and interpretability. Overall, TransDiffSBDD offers a principled framework for principled multi-modal SBDD with notable gains in docking performance and ligand diversity.
Abstract
Structure-based drug design (SBDD) is a critical task in drug discovery, requiring the generation of molecular information across two distinct modalities: discrete molecular graphs and continuous 3D coordinates. However, existing SBDD methods often overlook two key challenges: (1) the multi-modal nature of this task and (2) the causal relationship between these modalities, limiting their plausibility and performance. To address both challenges, we propose TransDiffSBDD, an integrated framework combining autoregressive transformers and diffusion models for SBDD. Specifically, the autoregressive transformer models discrete molecular information, while the diffusion model samples continuous distributions, effectively resolving the first challenge. To address the second challenge, we design a hybrid-modal sequence for protein-ligand complexes that explicitly respects the causality between modalities. Experiments on the CrossDocked2020 benchmark demonstrate that TransDiffSBDD outperforms existing baselines.
