Table of Contents
Fetching ...

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Junyi An, Xinyu Lu, Yun-Fei Shi, Li-Cheng Xu, Nannan Zhang, Chao Qu, Yuan Qi, Fenglei Cao

Abstract

We introduce Suiren-1.0, a family of molecular foundation models for the accurate modeling of diverse organic systems. Suiren-1.0 comprising three specialized variants (Suiren-Base, Suiren-Dimer, and Suiren-ConfAvg) is integrated within an algorithmic framework that bridges the gap between 3D conformational geometry and 2D statistical ensemble spaces. We first pre-train Suiren-Base (1.8B parameters) on a 70M-sample Density Functional Theory dataset using spatial self-supervision and SE(3)-equivariant architectures, achieving robust performance in quantum property prediction. Suiren-Dimer extends this capability through continued pre-training on 13.5M intermolecular interaction samples. To enable efficient downstream application, we propose Conformation Compression Distillation (CCD), a diffusion-based framework that distills complex 3D structural representations into 2D conformation-averaged representations. This yields the lightweight Suiren-ConfAvg, which generates high-fidelity representations from SMILES or molecular graphs. Our extensive evaluations demonstrate that Suiren-1.0 establishes state-of-the-art results across a range of tasks. All models and benchmarks are open-sourced.

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Abstract

We introduce Suiren-1.0, a family of molecular foundation models for the accurate modeling of diverse organic systems. Suiren-1.0 comprising three specialized variants (Suiren-Base, Suiren-Dimer, and Suiren-ConfAvg) is integrated within an algorithmic framework that bridges the gap between 3D conformational geometry and 2D statistical ensemble spaces. We first pre-train Suiren-Base (1.8B parameters) on a 70M-sample Density Functional Theory dataset using spatial self-supervision and SE(3)-equivariant architectures, achieving robust performance in quantum property prediction. Suiren-Dimer extends this capability through continued pre-training on 13.5M intermolecular interaction samples. To enable efficient downstream application, we propose Conformation Compression Distillation (CCD), a diffusion-based framework that distills complex 3D structural representations into 2D conformation-averaged representations. This yields the lightweight Suiren-ConfAvg, which generates high-fidelity representations from SMILES or molecular graphs. Our extensive evaluations demonstrate that Suiren-1.0 establishes state-of-the-art results across a range of tasks. All models and benchmarks are open-sourced.
Paper Structure (37 sections, 7 equations, 5 figures, 13 tables)

This paper contains 37 sections, 7 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Benchmark performance of Suiren-1.0 and its counterparts. We use the normalized MAE scores ($\uparrow$).
  • Figure 2: Comparison of Suiren-1.0 model and molecular Foundation model across various tasks in 8 domains. All tasks are regression tasks, with MAE ($\downarrow$) as the evaluation metric. Due to significant differences in metric ranges across different tasks, the y-axis is scaled.
  • Figure 3: Microscopic and macroscopic representations of molecular ensembles.(a) Molecular Representation: A single molecular identity corresponds to a diverse ensemble of 3D conformations at the microscopic space. (b) Conformational distribution: The relative probability of these conformations is governed by the Boltzmann distribution as a function of potential energy. (c) Ensemble Property: Macroscopic observables emerge as the ensemble-averaged properties derived from the collective contributions of all constituent conformations.
  • Figure 4: The architecture of the Suiren-Base model. (a) Overall framework. (b) A dense MoE block. (c) Modified EST expert: during training, the spherical Fourier transform basis set and orientation embedding are subjected to a random rotation.
  • Figure 5: Overview of training stages.(a) 3D Pre-training: Self-supervised learning on 3D molecular conformations. (b) Conformation Distillation: Distilling 3D geometric knowledge into a conformation-averaged representation. (c) Downstream Fine-tuning: Adapting the model for supervised molecular property prediction.