Table of Contents
Fetching ...

Vib2Mol: from vibrational spectra to molecular structures-a unified deep learning framework

Xinyu Lu, Hao Ma, Hui Li, Jia Li, Yi Rong, Yuqiang Li, Tong Zhu, Guokun Liu, Bin Ren

TL;DR

Vib2Mol presents a unified deep learning framework that bridges spectrum-to-structure retrieval and generation for vibrational spectroscopy, enabling versatile tasks under varying prior knowledge. The model leverages multiphase training with alignment via contrastive learning, generation via MLM and LM, and a generate-then-rerank pipeline, augmented by coarse-to-fine retrieval and a cross-modal masking strategy to handle Raman, IR, or both inputs. It achieves state-of-the-art results on theoretical IR/Raman benchmarks and outperforms baselines on experimental data, with demonstrated capabilities in predicting reaction products and sequencing peptides, including PTM site identification. The framework shows promise for autonomous discovery workflows and in situ analysis of chemical and biological processes, with potential extensions to stereochemistry and graph-based representations for broader applicability.

Abstract

There will be a paradigm shift in chemical and biological research, to be enabled by autonomous, closed-loop, real-time self-directed decision-making experimentation. Spectrum-to-structure correlation, which is to elucidate molecular structures with spectral information, is the core step in understanding the experimental results and to close the loop. However, current approaches usually divide the task into either database-dependent retrieval and database-independent generation and neglect the inherent complementarity between them. In this study, we proposed Vib2Mol, a unified deep learning framework designed to flexibly handle diverse spectrum-to-structure tasks according to the available prior knowledge by bridging the retrieval and generation. Empowered by our coarse-to-fine retrieval and generate-then-rerank strategies, Vib2Mol not only achieves state-of-the-art performance in analyzing theoretical Infrared and Raman spectra, but also outperform previous models on experimental data. Moreover, our model demonstrates promising capabilities in predicting reaction products and sequencing peptides, enabling vibrational spectroscopy a potential guide for autonomous scientific discovery workflows.

Vib2Mol: from vibrational spectra to molecular structures-a unified deep learning framework

TL;DR

Vib2Mol presents a unified deep learning framework that bridges spectrum-to-structure retrieval and generation for vibrational spectroscopy, enabling versatile tasks under varying prior knowledge. The model leverages multiphase training with alignment via contrastive learning, generation via MLM and LM, and a generate-then-rerank pipeline, augmented by coarse-to-fine retrieval and a cross-modal masking strategy to handle Raman, IR, or both inputs. It achieves state-of-the-art results on theoretical IR/Raman benchmarks and outperforms baselines on experimental data, with demonstrated capabilities in predicting reaction products and sequencing peptides, including PTM site identification. The framework shows promise for autonomous discovery workflows and in situ analysis of chemical and biological processes, with potential extensions to stereochemistry and graph-based representations for broader applicability.

Abstract

There will be a paradigm shift in chemical and biological research, to be enabled by autonomous, closed-loop, real-time self-directed decision-making experimentation. Spectrum-to-structure correlation, which is to elucidate molecular structures with spectral information, is the core step in understanding the experimental results and to close the loop. However, current approaches usually divide the task into either database-dependent retrieval and database-independent generation and neglect the inherent complementarity between them. In this study, we proposed Vib2Mol, a unified deep learning framework designed to flexibly handle diverse spectrum-to-structure tasks according to the available prior knowledge by bridging the retrieval and generation. Empowered by our coarse-to-fine retrieval and generate-then-rerank strategies, Vib2Mol not only achieves state-of-the-art performance in analyzing theoretical Infrared and Raman spectra, but also outperform previous models on experimental data. Moreover, our model demonstrates promising capabilities in predicting reaction products and sequencing peptides, enabling vibrational spectroscopy a potential guide for autonomous scientific discovery workflows.

Paper Structure

This paper contains 25 sections, 6 equations, 12 figures, 14 tables, 1 algorithm.

Figures (12)

  • Figure 1: The framework of Vib2Mol for pretraining. (A) The alignment phase: spectra and molecular structures are represented as patch tokens and SMILES tokens, respectively. After processed by their encoders, spectral and molecular information are aligned by CL. Subsequently, hard negative samples are selected and employed to guide model in learning the subtle distinctions between these highly similar spectra or molecule samples. (B) The generation phase: for conditional generation, molecules are randomly masked 45% and encoded by the same molecular encoder used for spectrum-structure alignment. The molecular decoder fuses spectral information with molecular features and predicts masked tokens. For de novo generation, molecule is sequentially masked and directed input into the same molecular decoder as conditional generation without the prior encoding. Then, the decoder predicts the next token on the basis of previous information, spectral features and chemical formulae (if given).
  • Figure 2: The workflow of Vib2Mol for addressing different spectrum-to-structure tasks: (A) spectrum-spectrum retrieval, where only the spectral encoder is used to calculate the similarity between spectral pairs; (B) spectrum-structure retrieval, where spectra and molecules are encoded by their respective encoders to determine spectrum-structure similarity; (C) conditional generation, and (D) de novo generation, both following workflows during the stage of pretraining. (E) re-ranking module for refining retrieval and generation results. It initially filters candidates by chemical formula (if available), then uses a pre-trained molecular encoder to score them against the query spectrum. High-scoring candidates are finally selected as output.
  • Figure 3: Performance evaluation of advanced deep learning models. (A) and (B) present a performance comparison of various models on spectrum-to-structure retrieval and de novo molecular generation, respectively. These evaluations were conducted on both theoretical (VB, QM9S) and experimental (NIST, SDBS) benchmarks. The impact of multi-modal spectral input on performance of Vib2Mol is further detailed in (C) for retrieval and (D) for generation.
  • Figure 4: Ablation studies of Vib2Mol on the VB-Mols-Raman dataset and visualization and statistical analysis of Vib2Mol representation learning. The performance contributions of different modules and hyperparameters are systematically assessed for three tasks: (A) retrieval and (B) de novo generation. (C) Alignment of spectral and structural embeddings, which illustrates the distribution of cosine similarities between spectrum and structure embeddings of the same molecule before and after training. (D) Performance on similar molecules categorized by functional groups, including retrieval and de novo generation tasks.
  • Figure 5: Workflow and performance of Vib2Mol in product elucidation and mixed-spectrum analysis. (A) Three scenarios for predicting products. (B) Benchmarking on PAH substitution reactions. (C) Retrieval and de novo generation results on unmixed and mixed spectra of general chemical reactions.
  • ...and 7 more figures