DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models
Liang Wang, Yu Rong, Tingyang Xu, Zhenyi Zhong, Zhiyuan Liu, Pengju Wang, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang, Yang Zhang
TL;DR
DiffSpectra reframes molecular structure elucidation from spectra as a conditional generation task using diffusion models that jointly produce 2D topology and 3D geometry. It introduces the Diffusion Molecule Transformer (DMT) for SE(3)-equivariant denoising and SpecFormer for multi-modal spectral conditioning, enabling de novo structure elucidation from UV–Vis, IR, and Raman spectra. The framework achieves strong top-1 and top-10 accuracies (about $40.8\%$ and $99.5\%$, respectively) and high 3D fidelity, with notable gains from pre-trained spectral encoders and multi-modal conditioning; sampling multiple candidates yields near-exhaustive coverage of the true structure space. Gradient-based trajectory analysis reveals a staged generation process, and the approach generalizes across molecules of varying size, suggesting practical utility for open-ended discovery and downstream validation, while outlining future extensions to additional spectroscopies and larger systems.
Abstract
Molecular structure elucidation from spectra is a fundamental challenge in molecular science. Conventional approaches rely heavily on expert interpretation and lack scalability, while retrieval-based machine learning approaches remain constrained by limited reference libraries. Generative models offer a promising alternative, yet most adopt autoregressive architectures that overlook 3D geometry and struggle to integrate diverse spectral modalities. In this work, we present DiffSpectra, a generative framework that formulates molecular structure elucidation as a conditional generation process, directly inferring 2D and 3D molecular structures from multi-modal spectra using diffusion models. Its denoising network is parameterized by the Diffusion Molecule Transformer, an SE(3)-equivariant architecture for geometric modeling, conditioned by SpecFormer, a Transformer-based spectral encoder capturing multi-modal spectral dependencies. Extensive experiments demonstrate that DiffSpectra accurately elucidates molecular structures, achieving 40.76% top-1 and 99.49% top-10 accuracy. Its performance benefits substantially from 3D geometric modeling, SpecFormer pre-training, and multi-modal conditioning. To our knowledge, DiffSpectra is the first framework that unifies multi-modal spectral reasoning and joint 2D/3D generative modeling for de novo molecular structure elucidation.
