OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design
Ian Dunn, Liv Toft, Tyler Katz, Juhi Gupta, Riya Shah, Ramith Hettiarachchi, David R. Koes
TL;DR
OMTRA presents a flexible multi-task generative framework for structure-based drug design that unifies pocket-conditioned design, conformer generation, docking, and pharmacophore conditioning under a single multi-modal flow-matching model. It extends FlowMol3 to heterogeneous graphs with SE(3)-equivariant operations and introduces the Pharmit 500M conformer dataset to enable large-scale pretraining across modalities. Empirically, OMTRA achieves state-of-the-art performance on pocket-conditioned design and docking tasks, while showing that large-scale pretraining and multi-task training yield modest and task-dependent gains. The work also demonstrates the value of pharmacophore conditioning as an effective inductive bias, and provides an open-source release of code, models, and the Pharmit dataset to support further research in multi-task molecular generation.
Abstract
Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often make use of virtual screening methods via docking or pharmacophore search. Modern generative modeling approaches have focused on improving novel ligand discovery by enabling de novo design. In this work, we recognize that these tasks share a common structure and can therefore be represented as different instantiations of a consistent generative modeling framework. We propose a unified approach in OMTRA, a multi-modal flow matching model that flexibly performs many tasks relevant to SBDD, including some with no analogue in conventional workflows. Additionally, we curate a dataset of 500M 3D molecular conformers, complementing protein-ligand data and expanding the chemical diversity available for training. OMTRA obtains state of the art performance on pocket-conditioned de novo design and docking; however, the effects of large-scale pretraining and multi-task training are modest. All code, trained models, and dataset for reproducing this work are available at https://github.com/gnina/OMTRA
