3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information
Taojie Kuang, Yiming Ren, Zhixiang Ren
TL;DR
3D-Mol introduces a hierarchical 3D molecular encoder that decouples atom-bond, bond-angle, and dihedral-angle information into three graphs $G_{a-b}$, $G_{b-a}$, and $G_{d-a}$. It couples this encoder with a weighted contrastive pretraining scheme over 20M unlabeled conformations, where conformations sharing the same SMILES are weighted positives and others are negatives weighted by 3D descriptor and fingerprint similarities, and augments this with geometry-centric pretraining tasks. The model, pretrained on large unlabeled data and finetuned on MoleculeNet benchmarks, achieves state-of-the-art or near-state-of-the-art performance on multiple datasets, notably excelling on BACE and several regression tasks, and outperforms non-pretrained and many pretrained baselines. Ablation studies confirm the contributions of the dihedral-angle graph, the weighted contrastive weighting, and the overall pretraining strategy. While effective, the approach hinges on the computationally intensive generation of 3D conformations, pointing to future work on efficiency enhancements to broaden practical applicability in drug discovery pipelines.
Abstract
Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
