Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Xiaohong Ji; Zhen Wang; Zhifeng Gao; Hang Zheng; Linfeng Zhang; Guolin Ke; Weinan E

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

TL;DR

This work introduces Uni-Mol2, a two-track transformer for molecular pretraining that integrates atomic, graph, and geometric information and investigates scaling laws in molecular models. By curating a large 3D-conformation dataset (~884M compounds) and scaling the model to 1.1B parameters, the authors demonstrate a power-law relationship between validation loss and model size, data size, and compute. Downstream evaluations on QM9 show substantial gains, with an average ~27% improvement at 1.1B parameters, while COMPAS-1D also benefits (up to ~14%), and scaling generally improves performance though some geometry-sensitive properties may converge. The results highlight the potential of large-scale molecular pretraining, offer a predictive scaling framework, and point to future directions in generative tasks and architecture design for molecular AI.

Abstract

In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining models remains unexplored. In this work, we present Uni-Mol2 , an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, characterizing the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show consistent improvement in the downstream tasks as the model size grows. The Uni-Mol2 with 1.1B parameters also outperforms existing methods, achieving an average 27% improvement on the QM9 and 14% on COMPAS-1D dataset.

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

TL;DR

Abstract

Paper Structure (25 sections, 16 equations, 4 figures, 9 tables)

This paper contains 25 sections, 16 equations, 4 figures, 9 tables.

Introduction
Related Work
Molecular representation learning
Foundation models
Pretraining
Data
Architecture
Hyperparameter and Training Details
Scaling Laws
Downstream Experiment
QM9 Dataset
Baselines
Results
COMPAS-1D Dataset
The Performance on Limited QM9 Dataset
...and 10 more sections

Figures (4)

Figure 1: Top: Comparison of scaffold frequency between Uni-Mol and Uni-Mol2 dataset. Bottom: Scaffolds distribution on Uni-Mol2 dataset
Figure 2: Left: The overall pretraining architecture. Middle: Atom and Pair representation. Right: The details of backbone block
Figure 3: Validation loss curves. Training curves for Uni-Mol2 model from 42M to 1.1B parameters. Models are trained on 0.8B samples. At the convergence stage, the 84M parameters model has a loss of 0.105, and the 1.1B parameters model reaches a loss of 0.087.
Figure 4: Graph of actual loss and prediction loss across different updates for the 570M (left) and 1.1B (right) models

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

TL;DR

Abstract

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (4)