Table of Contents
Fetching ...

PROflow: An iterative refinement model for PROTAC-induced structure prediction

Bo Qiang, Wenxian Shi, Yuxuan Song, Menghua Wu

TL;DR

PROTACs enable selective protein degradation through ternary E3 ligase–POI complexes, but the scarcity of ternary structures has hindered structure-based design. The authors introduce PROflow, an iterative refinement model built on flow matching that learns a conditional vector field over SE(3) transformations to sample PROTAC-induced E3–POI poses while accounting for full linker flexibility. Crucially, they generate a large pseudo-ternary dataset from binary protein–protein complexes and PROTAC linker graphs to train end-to-end, enabling linker-aware sampling despite limited real ternary data; PROflow is implemented with an SE(3)-equivariant network and a linker-compatible space $ ext{M}_oldsymbol{ extell}$. Empirically, PROflow achieves state-of-the-art docking metrics and up to 60x faster inference, and its computed structural properties correlate with degradation activities on PROTAC-DB and VHL-SMARC2, supporting its utility for large-scale PROTAC design and screening.

Abstract

Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.

PROflow: An iterative refinement model for PROTAC-induced structure prediction

TL;DR

PROTACs enable selective protein degradation through ternary E3 ligase–POI complexes, but the scarcity of ternary structures has hindered structure-based design. The authors introduce PROflow, an iterative refinement model built on flow matching that learns a conditional vector field over SE(3) transformations to sample PROTAC-induced E3–POI poses while accounting for full linker flexibility. Crucially, they generate a large pseudo-ternary dataset from binary protein–protein complexes and PROTAC linker graphs to train end-to-end, enabling linker-aware sampling despite limited real ternary data; PROflow is implemented with an SE(3)-equivariant network and a linker-compatible space . Empirically, PROflow achieves state-of-the-art docking metrics and up to 60x faster inference, and its computed structural properties correlate with degradation activities on PROTAC-DB and VHL-SMARC2, supporting its utility for large-scale PROTAC design and screening.

Abstract

Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.
Paper Structure (50 sections, 17 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 50 sections, 17 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: PROTACs are small molecules composed of two "warheads" and a connecting linker. The warheads bind to the E3 ligase and protein of interest (POI), while the flexible linker brings the two proteins into proximity. Top: ternary complex between POI (left), E3 ligase (right), PROTAC (small molecule). Bottom: PROTAC binding site, with respective warheads and anchor bonds. PDB 5T35.
  • Figure 2: Overview of PROflow, illustrated for PDB 7PI4. Iteratively refine E3 ligase poses by 1) predicting a rotation/translation update from the learned vector field, 2) applying these vectors to approximate the next pose, and 3) projecting to the closest linker-compatible pose.
  • Figure 3: Left: DC$_{\text{50}}$ Solvent-accessible area buried at the interface(dSASA) of PROflow predictions, over PROTAC-DB. Right: D$_{\text{max}}$ Rosetta energy of PROflow predictions, over PROTACs that vary only in linker.
  • Figure 4: Pseudo-data generation procedure. 1) Generate PROTAC linker conformation library. 2) Identify high-curvature putative pockets and sample 1 per protein in binary protein-protein complex. 3) Match each protein pair to linker with lowest RMSD from anchor bonds.
  • Figure 5: Visualized results of structures (PDB 6HAX) predicted by different methods. The ground truth E3 ligase pose is represented as a semitransparent surfaces.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition A.1: Linker-compatible space
  • Definition A.2: Projection to $\mathcal{M}_\ell$