MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

Liang Wang; Shaozhen Liu; Yu Rong; Deli Zhao; Qiang Liu; Shu Wu; Liang Wang

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

Liang Wang, Shaozhen Liu, Yu Rong, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

TL;DR

This work tackles the limitation of classical-energy-only pre-training for 3D molecular representations by incorporating quantum energy spectra. It introduces MolSpectra, featuring SpecFormer, a multi-spectrum encoder, plus a denoising-based 3D pre-training objective, a masked-patches spectral reconstruction objective, and an InfoNCE-based contrastive alignment between 3D and spectral representations. A two-stage pre-training pipeline first leverages unlabeled geometries with denoising, then leverages QM9Spectra data to fuse spectral information, yielding state-of-the-art or competitive results on QM9 and MD17 benchmarks. The approach demonstrates that spectral knowledge improves downstream property prediction and molecular dynamics modeling, and it outlines future directions toward broader spectral modalities and backbone architectures.

Abstract

Establishing the relationship between 3D structures and the energy states of molecular systems has proven to be a promising approach for learning 3D molecular representations. However, existing methods are limited to modeling the molecular energy states from classical mechanics. This limitation results in a significant oversight of quantum mechanical effects, such as quantized (discrete) energy level structures, which offer a more accurate estimation of molecular energy and can be experimentally measured through energy spectra. In this paper, we propose to utilize the energy spectra to enhance the pre-training of 3D molecular representations (MolSpectra), thereby infusing the knowledge of quantum mechanics into the molecular representations. Specifically, we propose SpecFormer, a multi-spectrum encoder for encoding molecular spectra via masked patch reconstruction. By further aligning outputs from the 3D encoder and spectrum encoder using a contrastive objective, we enhance the 3D encoder's understanding of molecules. Evaluations on public benchmarks reveal that our pre-trained representations surpass existing methods in predicting molecular properties and modeling dynamics.

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

TL;DR

Abstract

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)