Table of Contents
Fetching ...

OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design

Ian Dunn, Liv Toft, Tyler Katz, Juhi Gupta, Riya Shah, Ramith Hettiarachchi, David R. Koes

TL;DR

OMTRA presents a flexible multi-task generative framework for structure-based drug design that unifies pocket-conditioned design, conformer generation, docking, and pharmacophore conditioning under a single multi-modal flow-matching model. It extends FlowMol3 to heterogeneous graphs with SE(3)-equivariant operations and introduces the Pharmit 500M conformer dataset to enable large-scale pretraining across modalities. Empirically, OMTRA achieves state-of-the-art performance on pocket-conditioned design and docking tasks, while showing that large-scale pretraining and multi-task training yield modest and task-dependent gains. The work also demonstrates the value of pharmacophore conditioning as an effective inductive bias, and provides an open-source release of code, models, and the Pharmit dataset to support further research in multi-task molecular generation.

Abstract

Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often make use of virtual screening methods via docking or pharmacophore search. Modern generative modeling approaches have focused on improving novel ligand discovery by enabling de novo design. In this work, we recognize that these tasks share a common structure and can therefore be represented as different instantiations of a consistent generative modeling framework. We propose a unified approach in OMTRA, a multi-modal flow matching model that flexibly performs many tasks relevant to SBDD, including some with no analogue in conventional workflows. Additionally, we curate a dataset of 500M 3D molecular conformers, complementing protein-ligand data and expanding the chemical diversity available for training. OMTRA obtains state of the art performance on pocket-conditioned de novo design and docking; however, the effects of large-scale pretraining and multi-task training are modest. All code, trained models, and dataset for reproducing this work are available at https://github.com/gnina/OMTRA

OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design

TL;DR

OMTRA presents a flexible multi-task generative framework for structure-based drug design that unifies pocket-conditioned design, conformer generation, docking, and pharmacophore conditioning under a single multi-modal flow-matching model. It extends FlowMol3 to heterogeneous graphs with SE(3)-equivariant operations and introduces the Pharmit 500M conformer dataset to enable large-scale pretraining across modalities. Empirically, OMTRA achieves state-of-the-art performance on pocket-conditioned design and docking tasks, while showing that large-scale pretraining and multi-task training yield modest and task-dependent gains. The work also demonstrates the value of pharmacophore conditioning as an effective inductive bias, and provides an open-source release of code, models, and the Pharmit dataset to support further research in multi-task molecular generation.

Abstract

Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often make use of virtual screening methods via docking or pharmacophore search. Modern generative modeling approaches have focused on improving novel ligand discovery by enabling de novo design. In this work, we recognize that these tasks share a common structure and can therefore be represented as different instantiations of a consistent generative modeling framework. We propose a unified approach in OMTRA, a multi-modal flow matching model that flexibly performs many tasks relevant to SBDD, including some with no analogue in conventional workflows. Additionally, we curate a dataset of 500M 3D molecular conformers, complementing protein-ligand data and expanding the chemical diversity available for training. OMTRA obtains state of the art performance on pocket-conditioned de novo design and docking; however, the effects of large-scale pretraining and multi-task training are modest. All code, trained models, and dataset for reproducing this work are available at https://github.com/gnina/OMTRA

Paper Structure

This paper contains 69 sections, 15 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: OMTRA: A Flexible Multi-Task Generative Model for Structure-Based Drug Design. OMTRA is capable of performing de novo design, conformer generation and docking. It can be guided by conditioning on structural information, such as a protein pocket and pharmacophores.
  • Figure A1: Trajectories of Ligand Generation Using OMTRA. Each row demonstrates the stepwise evolution of ligand structures across OMTRA’s sampling trajectory. The top row illustrates unconditional de novo design, where OMTRA generates ligand structures without guiding constraints. The bottom row illustrates pharmacophore-conditioned de novo design, where OMTRA generates ligand structures guided by pre-defined pharmacophores.
  • Figure A2: Trajectories of Ligand Generation Inside Pocket Using OMTRA. Each row demonstrates the stepwise evolution of ligand structures across OMTRA's sampling trajectory. The top row illustrates pocket-conditioned de novo design, where OMTRA generates ligand structures guided by the receptor pocket. The bottom row illustrates pocket and pharmacophore-conditioned de novo design, where OMTRA generates ligand structures guided by both the receptor pocket and pre-defined pharmacophores.
  • Figure A3: PoseBusters Checks. Comparison of multi-task versus single task OMTRA models on PoseBusters checks for de novo design (Top) and docking (Bottom). All models were pretrained on unconditional ligand generation tasks. Metrics report the fraction of passing ligands, calculated from 100 samples per pocket across 100 proteins in the Plinder test split.
  • Figure A4: Top De Novo Ligands Generated by OMTRA Multi-Task Model. Ligand generation was conditioned on the binding pocket of thiamin phosphate synthase (PDB: 1G4S). Protein-ligand contacts are shown as dashed lines. The ground truth ligand, thiamin phosphate (CCD: TPS), is shown in the left panel. De novo generated samples (Right) for this target have an interaction recovery rate of 85.7-100%.
  • ...and 3 more figures