Table of Contents
Fetching ...

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

Liang Wang, Shaozhen Liu, Yu Rong, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

TL;DR

This work tackles the limitation of classical-energy-only pre-training for 3D molecular representations by incorporating quantum energy spectra. It introduces MolSpectra, featuring SpecFormer, a multi-spectrum encoder, plus a denoising-based 3D pre-training objective, a masked-patches spectral reconstruction objective, and an InfoNCE-based contrastive alignment between 3D and spectral representations. A two-stage pre-training pipeline first leverages unlabeled geometries with denoising, then leverages QM9Spectra data to fuse spectral information, yielding state-of-the-art or competitive results on QM9 and MD17 benchmarks. The approach demonstrates that spectral knowledge improves downstream property prediction and molecular dynamics modeling, and it outlines future directions toward broader spectral modalities and backbone architectures.

Abstract

Establishing the relationship between 3D structures and the energy states of molecular systems has proven to be a promising approach for learning 3D molecular representations. However, existing methods are limited to modeling the molecular energy states from classical mechanics. This limitation results in a significant oversight of quantum mechanical effects, such as quantized (discrete) energy level structures, which offer a more accurate estimation of molecular energy and can be experimentally measured through energy spectra. In this paper, we propose to utilize the energy spectra to enhance the pre-training of 3D molecular representations (MolSpectra), thereby infusing the knowledge of quantum mechanics into the molecular representations. Specifically, we propose SpecFormer, a multi-spectrum encoder for encoding molecular spectra via masked patch reconstruction. By further aligning outputs from the 3D encoder and spectrum encoder using a contrastive objective, we enhance the 3D encoder's understanding of molecules. Evaluations on public benchmarks reveal that our pre-trained representations surpass existing methods in predicting molecular properties and modeling dynamics.

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

TL;DR

This work tackles the limitation of classical-energy-only pre-training for 3D molecular representations by incorporating quantum energy spectra. It introduces MolSpectra, featuring SpecFormer, a multi-spectrum encoder, plus a denoising-based 3D pre-training objective, a masked-patches spectral reconstruction objective, and an InfoNCE-based contrastive alignment between 3D and spectral representations. A two-stage pre-training pipeline first leverages unlabeled geometries with denoising, then leverages QM9Spectra data to fuse spectral information, yielding state-of-the-art or competitive results on QM9 and MD17 benchmarks. The approach demonstrates that spectral knowledge improves downstream property prediction and molecular dynamics modeling, and it outlines future directions toward broader spectral modalities and backbone architectures.

Abstract

Establishing the relationship between 3D structures and the energy states of molecular systems has proven to be a promising approach for learning 3D molecular representations. However, existing methods are limited to modeling the molecular energy states from classical mechanics. This limitation results in a significant oversight of quantum mechanical effects, such as quantized (discrete) energy level structures, which offer a more accurate estimation of molecular energy and can be experimentally measured through energy spectra. In this paper, we propose to utilize the energy spectra to enhance the pre-training of 3D molecular representations (MolSpectra), thereby infusing the knowledge of quantum mechanics into the molecular representations. Specifically, we propose SpecFormer, a multi-spectrum encoder for encoding molecular spectra via masked patch reconstruction. By further aligning outputs from the 3D encoder and spectrum encoder using a contrastive objective, we enhance the 3D encoder's understanding of molecules. Evaluations on public benchmarks reveal that our pre-trained representations surpass existing methods in predicting molecular properties and modeling dynamics.

Paper Structure

This paper contains 27 sections, 1 theorem, 10 equations, 5 figures, 9 tables.

Key Result

Theorem A.1

Assume the conformation distribution is a mixture of Gaussian distribution centered at the equilibriums: ${\bm{x}}_0,\ {\bm{x}}\in \mathbb{R}^{3N}$ are equilibrium and noisy conformation respectively, $N$ is the number of atoms in the molecule. It relates to molecular energy by Boltzmann distribution $p({\bm{x}}) \propto exp(-E({\bm{x}}))$. Then given a sampled molecule $\mathcal{M}$, the denoisi

Figures (5)

  • Figure 1: The conceptual view of MolSpectra, which leverages both molecular conformation and spectra for pre-training. Prior works only model classical mechanics by denoising on conformations.
  • Figure 2: Overview of the MolSpectra pre-training framework. Our pre-training framework comprises three sub-objectives: the denoising objective and the MPR objective, which respectively guide the representation learning of the 3D and spectral modalities, and the contrastive objective, which aligns the representations of both modalities.
  • Figure 3: Illustrate of intra-spectrum (left) and inter-spectrum (right) dependencies.
  • Figure A1: Randomly sampled examples of molecular energy spectra.
  • Figure A2: (a-c) Attention maps from three attention heads in SpecFormer. Different heads model distinct dependencies. (d) t-SNE visualization of the spectra representations produced by SpecFormer.

Theorems & Definitions (2)

  • Theorem A.1: Equivalence between the denoising objective and the learning of molecular force fields Coord
  • proof