An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

Rui Jiao; Xiangzhe Kong; Li Zhang; Ziyang Yu; Fangyuan Ren; Wenjuan Tan; Wenbing Huang; Yang Liu

An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

Rui Jiao, Xiangzhe Kong, Li Zhang, Ziyang Yu, Fangyuan Ren, Wenjuan Tan, Wenbing Huang, Yang Liu

TL;DR

EPT introduces a unified $E(3)$-equivariant transformer that learns from multi-domain 3D molecular data by defining blocks (e.g., heavy-atom groups or amino acids) and applying block-level denoising during pretraining. The model leverages a geometric transformer with equivariant self-attention and a fusion mechanism for scalar and vector features, achieving strong cross-domain transfer across LBA, MSP, and MPP tasks and excelling in COVID-19 drug screening by ranking known antivirals at the top. The large 5.89M-entry pretraining dataset spanning small molecules, proteins, and complexes enables generalizable learning of hierarchical geometry, with ablation studies underscoring the importance of block-level representations and denoising. Empirically, EPT attains state-of-the-art or competitive performance on key benchmarks and demonstrates practical utility by identifying promising drug candidates for SARS-CoV-2 3CL protease, illustrating its potential to accelerate molecular discovery across domains.

Abstract

Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models in a specific domain, either proteins or small molecules, missing the opportunity to leverage cross-domain knowledge. To mitigate this gap, we introduce Equivariant Pretrained Transformer (EPT), an all-atom foundation model that can be pretrained from multiple domain 3D molecules. Built upon an E(3)-equivariant transformer, EPT is able to not only process atom-level information but also incorporate block-level features (e.g. residuals in proteins). Additionally, we employ a block-level denoising task, rather than the conventional atom-level denoising, as the pretraining objective. To pretrain EPT, we construct a large-scale dataset of 5.89M entries, comprising small molecules, proteins, protein-protein complexes, and protein-molecule complexes. Experimental evaluations on downstream tasks including ligand binding affinity prediction, protein property prediction, and molecular property prediction, show that EPT significantly outperforms previous state-of-the-art methods in the first task and achieves competitively superior performance for the remaining two tasks. Furthermore, we demonstrate the potential of EPT in identifying small molecule drug candidates targeting 3CL protease, a critical target in the replication of SARS-CoV-2. Among 1,978 FDA-approved drugs, EPT ranks 7 out of 8 known anti-COVID-19 drugs in the top 200, indicating the high recall of EPT. By using Molecular Dynamics (MD) simulations, EPT further discoveries 7 novel compounds whose binding affinities are higher than that of the top-ranked known anti-COVID-19 drug, showcasing its powerful capabilities in drug discovery.

An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

TL;DR

EPT introduces a unified

-equivariant transformer that learns from multi-domain 3D molecular data by defining blocks (e.g., heavy-atom groups or amino acids) and applying block-level denoising during pretraining. The model leverages a geometric transformer with equivariant self-attention and a fusion mechanism for scalar and vector features, achieving strong cross-domain transfer across LBA, MSP, and MPP tasks and excelling in COVID-19 drug screening by ranking known antivirals at the top. The large 5.89M-entry pretraining dataset spanning small molecules, proteins, and complexes enables generalizable learning of hierarchical geometry, with ablation studies underscoring the importance of block-level representations and denoising. Empirically, EPT attains state-of-the-art or competitive performance on key benchmarks and demonstrates practical utility by identifying promising drug candidates for SARS-CoV-2 3CL protease, illustrating its potential to accelerate molecular discovery across domains.

Abstract

Paper Structure (34 sections, 30 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 30 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Results
Overview of EPT
Multi-Domain Pretraining Dataset
Evaluation of model performance on downstream tasks
Ligand Binding Affinity Prediction (LBA)
Mutation Stability Prediction (MSP)
Molecule Property Prediction (MPP)
Analyses of the core components in EPT
EPT-accelerated discovery of anti-COVID-19 candidates
Discussion
Block-level denoised pretraining
Supplementary Methods
Dataset Collection
Detailed Dataset Distribution
...and 19 more sections

Figures (5)

Figure 1: | Overview of EPT. EPT is a foundation model for multi-domain 3D molecules. a EPT is capable of addressing diverse downstream tasks, after pretrained on a large-scale hybrid dataset containing small molecules, proteins and complexes. b, EPT integrates molecules from different domains by defining "blocks" as the fundamental units for each domain. For small molecules, blocks are defined as heavy atoms and their associated hydrogens, while for proteins, blocks correspond to amino acids. During pretraining, blocks are perturbed by random translations and rotations around the center-of-mass (CoM), and EPT is trained to recover the original structure. c, In EPT, the atom representations including scalars ${\bm{H}}$ and vectors ${\bm{V}}$ are first initialized via a GNN-based embedding layer, and then updated by equivariant self-attention and feed-forward layers. d, We demonstrate the efficacy of the pretrained EPT model in virtual screening for anti-COVID-19 drugs, outperforming computational and learning-based baselines.
Figure 2: | The performance of EPT on downstream tasks.a, EPT is evaluated on three downstream tasks: Ligand Binding Affinity Prediction (LBA), Mutation Stability Prediction (MSP) and Molecule Property Prediction (MPP). b, The Root Mean Square Error (RMSE), Pearson and Spearman correlation coefficients on both the id30 and id60 splits for the LBA task. c, The AUROC on Atom3D for the MSP task by the models without pretraining. d, The AUROC on Atom3D for the MSP task by the models with pretraining.
Figure 3: | The validation for EPT's core modules and pretraining strategy.a, Mean Absolute Errors (MAE) of the HOMO and LUMO predictions on QM9, under different EPT variants. b, Pearson and Spearman correlation coefficients on id30 and id60 splits for the LBA task, under different pretraining strategies. c, MAE of the HOMO and LUMO predictions on QM9, under different pretraining strategies.
Figure 4: | The application of EPT in screening anti-COVID-19 drugs.a, Benchmark experiments on the PDBBind dataset: predicting the binding affinity (evaluated by the Pearson and Spearman correlation coefficients), and identifying the positive candidate bound to the target pocket (evaluated by the ranking metrics AvgRank and Top1Acc). b, Visualization of EPT embeddings, rankings of 1,978 FDA-approved drugs based on EPT, and the presentation of focus molecules. c, d, A comprehensive analysis of the two hits screened based on EPT is conducted through molecular docking and MD simulations. Leritrelvir is a positive reference.
Figure 5: | Data description and preprocessing.a, The pretraining dataset combines small molecule conformations, protein structures as well as protein-protein and protein-molecule complexes. Each entry is represented as a list of blocks, with each block characterized by four features: the atom types, the block type, the ordered position indexes, and the atom coordinates. b, We illustrate the overall pipeline for evaluating the virtual screening task. Fine-tuning of the pretrained EPT model is performed using a curated dataset of protein-small molecule complexes collected from PDBBind and re-docked to generate positive and negative samples. The finetuned model is then applied to rank FDA-approved drugs based on their predicted binding probability from the docked complex with the 3CL protease.

An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

TL;DR

Abstract

An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)