Table of Contents
Fetching ...

SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein Degradation

Fanglei Xue, Meihan Zhang, Shuqi Li, Xinyu Gao, James A. Wohlschlegel, Wenbing Huang, Yi Yang, Weixian Deng

TL;DR

Targeted protein degradation (TPD) uses PROTACs and MGDs to recruit E3 ligases to a target protein, forming a ternary complex essential for ubiquitin-mediated degradation. The authors introduce DeepTernary, an SE(3)-equivariant graph neural network with ternary inter-graph attention and a query-based Pocket Points Decoder (PPPD) that predicts ternary complex structures end-to-end from disassembled monomers using TernaryDB (22,303 complexes). On unseen PROTAC benchmarks, DeepTernary achieves an average DockQ of $0.65$ and, on MG(D) benchmarks, $DockQ$ of $0.21$, while inference times are around $7$ s for PROTAC and $1$ s for MG(D) on CPU, with the predicted buried surface area (BSA) correlating with degradation potency. The method generalizes beyond training data, outperforms baselines such as EquiDock and AF3 on unseen PROTAC/MGD structures, and offers a fast, structure-guided route to accelerate TPD design for previously undruggable targets.

Abstract

Targeted protein degradation (TPD) induced by small molecules has emerged as a rapidly evolving modality in drug discovery, targeting proteins traditionally considered "undruggable". Proteolysis-targeting chimeras (PROTACs) and molecular glue degraders (MGDs) are the primary small molecules that induce TPD. Both types of molecules form a ternary complex linking an E3 ligase with a target protein, a crucial step for drug discovery. While significant advances have been made in binary structure prediction for proteins and small molecules, ternary structure prediction remains challenging due to obscure interaction mechanisms and insufficient training data. Traditional methods relying on manually assigned rules perform poorly and are computationally demanding due to extensive random sampling. In this work, we introduce DeepTernary, a novel deep learning-based approach that directly predicts ternary structures in an end-to-end manner using an encoder-decoder architecture. DeepTernary leverages an SE(3)-equivariant graph neural network (GNN) with both intra-graph and ternary inter-graph attention mechanisms to capture intricate ternary interactions from our collected high-quality training dataset, TernaryDB. The proposed query-based Pocket Points Decoder extracts the 3D structure of the final binding ternary complex from learned ternary embeddings, demonstrating state-of-the-art accuracy and speed in existing PROTAC benchmarks without prior knowledge from known PROTACs. It also achieves notable accuracy on the more challenging MGD benchmark under the blind docking protocol. Remarkably, our experiments reveal that the buried surface area calculated from predicted structures correlates with experimentally obtained degradation potency-related metrics. Consequently, DeepTernary shows potential in effectively assisting and accelerating the development of TPDs for previously undruggable targets.

SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein Degradation

TL;DR

Targeted protein degradation (TPD) uses PROTACs and MGDs to recruit E3 ligases to a target protein, forming a ternary complex essential for ubiquitin-mediated degradation. The authors introduce DeepTernary, an SE(3)-equivariant graph neural network with ternary inter-graph attention and a query-based Pocket Points Decoder (PPPD) that predicts ternary complex structures end-to-end from disassembled monomers using TernaryDB (22,303 complexes). On unseen PROTAC benchmarks, DeepTernary achieves an average DockQ of and, on MG(D) benchmarks, of , while inference times are around s for PROTAC and s for MG(D) on CPU, with the predicted buried surface area (BSA) correlating with degradation potency. The method generalizes beyond training data, outperforms baselines such as EquiDock and AF3 on unseen PROTAC/MGD structures, and offers a fast, structure-guided route to accelerate TPD design for previously undruggable targets.

Abstract

Targeted protein degradation (TPD) induced by small molecules has emerged as a rapidly evolving modality in drug discovery, targeting proteins traditionally considered "undruggable". Proteolysis-targeting chimeras (PROTACs) and molecular glue degraders (MGDs) are the primary small molecules that induce TPD. Both types of molecules form a ternary complex linking an E3 ligase with a target protein, a crucial step for drug discovery. While significant advances have been made in binary structure prediction for proteins and small molecules, ternary structure prediction remains challenging due to obscure interaction mechanisms and insufficient training data. Traditional methods relying on manually assigned rules perform poorly and are computationally demanding due to extensive random sampling. In this work, we introduce DeepTernary, a novel deep learning-based approach that directly predicts ternary structures in an end-to-end manner using an encoder-decoder architecture. DeepTernary leverages an SE(3)-equivariant graph neural network (GNN) with both intra-graph and ternary inter-graph attention mechanisms to capture intricate ternary interactions from our collected high-quality training dataset, TernaryDB. The proposed query-based Pocket Points Decoder extracts the 3D structure of the final binding ternary complex from learned ternary embeddings, demonstrating state-of-the-art accuracy and speed in existing PROTAC benchmarks without prior knowledge from known PROTACs. It also achieves notable accuracy on the more challenging MGD benchmark under the blind docking protocol. Remarkably, our experiments reveal that the buried surface area calculated from predicted structures correlates with experimentally obtained degradation potency-related metrics. Consequently, DeepTernary shows potential in effectively assisting and accelerating the development of TPDs for previously undruggable targets.

Paper Structure

This paper contains 10 sections, 20 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: DeepTernary is a deep learning model for predicting the structure of the ternary complex induced by PROTACs and MG(D)s.a, The MOA of PROTACs and MGDs. The protein of interest (POI) and the E3 ligase are recruited to proximity by PROTACs or MGDs to form a ternary complex, which then the Ubiquitin-Proteasome System (UPS) is employed to transfer the ubiquitin and degrade the POI. b, To mitigate the scarcity of known PROTACs and MG(D)s structures, a large-scale ternary complex dataset (named TernaryDB) was collected by searching and cleaning complexes from the Protein Data Bank (PDB) archive. The collected samples were then grouped into clusters by similarity. Any complex that is similar to known PROTAC and MG(D) induced complexes was excluded from the training set. DeepTernary was trained on this filtered database by predicting the original complex structure using dissembled monomers. c, DeepTernary is an SE(3)-equivalent graph neural network equipped with attention blocks to facilitate efficient information exchange. It begins by representing two proteins and a small molecule as three graphs, encoding node coordinates, diverse amino acid or atom characteristics as node features, edge types and distances as edge features. The three graphs are fed into an encoder consisting of a series of SE(3)-equivariant blocks, enabling both intra- and inter-graph learning to capture interactions effectively. The encoder will predict the conformation of the small molecule and output the refined node features/coordinates of the two proteins. Subsequently, a decoder comprising several attention-based blocks employs these refined features/coordinates to generate two pairs of pocket points and a predicted aligned error (PAE). The pocket points are then used to align the small molecule and protein 2 to protein 1. * For PROTAC, the pocket points are taken from unbound structures, don't need to predict. ** For MG(D), the ligand and protein 2 are simultaneously aligned to protein 1.
  • Figure 1: Ligand pose accuracy on the test set of (a) PROTAC and (b) MG(D).
  • Figure 2: TernaryDB construction and visualization.a, The process of collecting and cleaning the ternary complexes dataset. Initially, a search of ternary structures from the PDB yielded 46,797 PDB IDs, each of which contains at least two proteins and one small molecule. High-quality PDB IDs were retained based on criteria such as X-ray crystallography data, resolution, and R-free value. From this subset, 42,441 complexes were extracted, each comprising just two proteins and one small molecule. These complexes underwent further refinement based on peptide chain length and the number of contacts. Ultimately, 22,303 complexes met our stringent criteria and were used to train our model. b, Histogram of the ligand atom number (excluding hydrogens) within the dataset. c, Histogram of cluster sizes within the dataset according to the protein sequence similarity. d, The distribution of protein source organisms in the dataset. e, Proteome-wide view of the collected dataset. ESM-1b rivesESM1b2021 sequence embeddings for the two proteins in each complex are calculated and concatenated. This is followed by two-dimensional (2D) Uniform Manifold Approximation and Projection (UMAP). Similar complexes to PROTACs- and MG(D)s-involved ternary structures are denoted as red and green square points, respectively. f, Chemical space covered by the dataset. Morgan fingerprints are converted to 1024-length vectors and visualized through a 2D UMAP. The points on the map are differentiated and colored by molecular weight (hydrogen excluded). PROTACs- and MG(D)s-like molecules are highlighted as red and green square points, respectively.
  • Figure 2: Correlation between PAE and DockQ scores for predicted ternary complexes.a, Intra-complex correlation for PROTACs. Scatter plots illustrate the relationship between PAE and DockQ scores across 40 initial conformations for each PROTAC test complex. Each point represents a single conformation. Most complexes display a negative correlation, indicating that lower PAE values generally correspond to higher DockQ scores. This suggests that PAE can serve as a useful indicator of prediction accuracy within a given complex. b, Scatter plot for the best-predicted conformation (i.e., the one with the highest DockQ) for each PROTAC test complex. The plot demonstrates a clear trend: complexes with PAE scores below 4 tend to have higher DockQ scores (> 0.5), further supporting the use of PAE as a confidence metric. c, Across-Complex Correlation for top-1 PROTAC predictions (i.e., the predictions with the lowest PAE). Despite some false positives, the overall trend remains negatively correlated. d, Correlation for MG(D) predictions. Similar to PROTACs, a clear negative correlation is observed, with lower PAE values associated with higher DockQ scores, suggesting that PAE is also an effective confidence metric for MG(D) predictions.
  • Figure 3: Effectiveness of DeepTernary designs on PROTAC and MG(D) test benchmarks. All results are based on test sets comprising 22 PROTAC complexes or 94 MG(D) complexes. Statistical significance was determined using an independent t-test: * p $\leq$ 0.05, ** p $\leq$ 0.01, and ***p $\leq$ 0.001, similarly hereinafter. a, Comparison of decoder types: our proposed Pocket Points Decoder outperforms IEGMN in predicting medium- to high-quality binding poses (DockQ > 0.49). b, Impact of multi-head attention on coordination prediction: increasing the number of heads results in a slight decrease in DockQ scores. c, Effect of latent embedding dimension on model performance: larger dimensions yield improved learning, especially for MG(D) complexes. d, Influence of noise level on model robustness: elevating the noise level from 1 to 2 enhances performance on both PROTAC and MG(D) benchmarks. e, Effect of number of sampled random conformations: more sampled conformations lead to higher DockQ scores and acceptance rates (DockQ > 0.23) for PROTACs, while MG(D) remains largely unaffected.
  • ...and 6 more figures