PROflow: An iterative refinement model for PROTAC-induced structure prediction
Bo Qiang, Wenxian Shi, Yuxuan Song, Menghua Wu
TL;DR
PROTACs enable selective protein degradation through ternary E3 ligase–POI complexes, but the scarcity of ternary structures has hindered structure-based design. The authors introduce PROflow, an iterative refinement model built on flow matching that learns a conditional vector field over SE(3) transformations to sample PROTAC-induced E3–POI poses while accounting for full linker flexibility. Crucially, they generate a large pseudo-ternary dataset from binary protein–protein complexes and PROTAC linker graphs to train end-to-end, enabling linker-aware sampling despite limited real ternary data; PROflow is implemented with an SE(3)-equivariant network and a linker-compatible space $ ext{M}_oldsymbol{ extell}$. Empirically, PROflow achieves state-of-the-art docking metrics and up to 60x faster inference, and its computed structural properties correlate with degradation activities on PROTAC-DB and VHL-SMARC2, supporting its utility for large-scale PROTAC design and screening.
Abstract
Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.
