Table of Contents
Fetching ...

Enhancing Peak Assignment in 13C NMR Spectroscopy: A Novel Approach Using Multimodal Alignment

Hao Xu, Zhengyang Zhou, Pengyu Hong

TL;DR

This work tackles the challenge of interpreting and mapping $^{13}$C NMR spectra to molecular structures, particularly under zero-shot conditions for retrieval, isomer recognition, and peak assignment. It introduces K-M3AID, a dual-level multimodal alignment framework that links molecular graphs with NMR spectra through graph-level and node-level alignments, guided by Knowledge Span-based instance discrimination. A Knowledge-Guided Instance-Wise Discrimination (KSGL) mechanism leverages continuous domain knowledge to steer cross-modal distance learning, and a communication channel enables cross-talk between encoders, with the overall objective $L = CL_{graph} + CL_{node}$. Empirical results on large-scale pretraining and zero-shot tasks show that K-M3AID achieves strong graph- and atom-level alignment, superior molecular retrieval, complete isomer recognition, and high-precision peak assignment, highlighting its potential to improve spectral interpretation and candidate ranking in practical NMR workflows. The work also demonstrates meta-learning aspects, showing that node-level skills boost graph-level alignment, and points to future work incorporating 3D graph representations to handle highly complex molecules.

Abstract

Nuclear magnetic resonance (NMR) spectroscopy plays an essential role in deciphering molecular structure and dynamic behaviors. While AI-enhanced NMR prediction models hold promise, challenges still persist in tasks such as molecular retrieval, isomer recognition, and peak assignment. In response, this paper introduces a novel solution, Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID), which establishes correspondences between two heterogeneous modalities: molecular graphs and NMR spectra. K-M3AID employs a dual-coordinated contrastive learning architecture with three key modules: a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, K-M3AID introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module. In addition, K-M3AID demonstrates that skills acquired during node-level alignment have a positive impact on graph-level alignment, acknowledging meta-learning as an inherent property. Empirical validation underscores K-M3AID's effectiveness in multiple zero-shot tasks.

Enhancing Peak Assignment in 13C NMR Spectroscopy: A Novel Approach Using Multimodal Alignment

TL;DR

This work tackles the challenge of interpreting and mapping C NMR spectra to molecular structures, particularly under zero-shot conditions for retrieval, isomer recognition, and peak assignment. It introduces K-M3AID, a dual-level multimodal alignment framework that links molecular graphs with NMR spectra through graph-level and node-level alignments, guided by Knowledge Span-based instance discrimination. A Knowledge-Guided Instance-Wise Discrimination (KSGL) mechanism leverages continuous domain knowledge to steer cross-modal distance learning, and a communication channel enables cross-talk between encoders, with the overall objective . Empirical results on large-scale pretraining and zero-shot tasks show that K-M3AID achieves strong graph- and atom-level alignment, superior molecular retrieval, complete isomer recognition, and high-precision peak assignment, highlighting its potential to improve spectral interpretation and candidate ranking in practical NMR workflows. The work also demonstrates meta-learning aspects, showing that node-level skills boost graph-level alignment, and points to future work incorporating 3D graph representations to handle highly complex molecules.

Abstract

Nuclear magnetic resonance (NMR) spectroscopy plays an essential role in deciphering molecular structure and dynamic behaviors. While AI-enhanced NMR prediction models hold promise, challenges still persist in tasks such as molecular retrieval, isomer recognition, and peak assignment. In response, this paper introduces a novel solution, Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID), which establishes correspondences between two heterogeneous modalities: molecular graphs and NMR spectra. K-M3AID employs a dual-coordinated contrastive learning architecture with three key modules: a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, K-M3AID introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module. In addition, K-M3AID demonstrates that skills acquired during node-level alignment have a positive impact on graph-level alignment, acknowledging meta-learning as an inherent property. Empirical validation underscores K-M3AID's effectiveness in multiple zero-shot tasks.
Paper Structure (29 sections, 1 theorem, 17 equations, 8 figures, 7 tables)

This paper contains 29 sections, 1 theorem, 17 equations, 8 figures, 7 tables.

Key Result

Theorem 1

Suppose $\mathcal{M}$ is the set of instances. $\mathcal{A} \subset \mathbb{R}^{d_1}$ is the set of tunable instances' embeddings in modality A, $\mathcal{B} \subset \mathbb{R}^{d_1}$ is the set of tuable instances' embeddings in modality B, and $\mathcal{K} \subset \mathbb{R}^{d_2}$ is the correspo

Figures (8)

  • Figure 1: a. Demands for interpreting NMR spectra in real-world scenarios: molecular retrieval, candidate ranking, and peak assignment; b. Zero-shot applications of the K-M3AID model: molecular retrieval, isomer recognition, and peak assignment; c. The framework of K-M3AID model: the molecular alignment module is responsible for optimizing the the correspondence between modalities at the molecular level, the atomic alignment module focus on the fine-tuning of atomic positioning on the spectrum, and the communication channel dynamically adjusts the flow of gradients between node encoder and graph encoder during the training process. $S$ for spectrum embedding, $G$ for graph embedding, $P$ for peak embedding and $N$ for node embedding.
  • Figure 2: Knowledge-Guided Instance-Wise Discrimination Mechanism. $K_{i}$ and $K_{j}$ represent the corresponding knowledge span labels for $i^{th}$ and $j^{th}$ items.
  • Figure 3: The statistics of zero-shot peak assignment.
  • Figure 4: Case study of peak assignment. Yellow cells in PPM difference represent the ground truth alignment, and red cross represents the wrong alignment. For the definition of ppm, please refer to \ref{['Knowledge-Span-ppm']}. For additional cases, please refer to Appendix Figure \ref{['fig:molecule-compare-appendix']}
  • Figure 5.1: NMR Variability in Isomers
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Knowledge Span Guided Loss
  • proof