Table of Contents
Fetching ...

Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra

Laura Mismetti, Marvin Alberts, Andreas Krause, Mara Graziani

TL;DR

The paper addresses end-to-end de novo molecular structure generation from MS/MS spectra, bypassing intermediate fragment prediction and database matching. It introduces test-time tuning of a pre-trained transformer that ingests MS/MS spectra and chemical formula as input and predicts SMILES, guided by an auxiliary fingerprint predictor and formula-constrained generation. The approach achieves state-of-the-art performance on NPLIB1 and MassSpecGym, with large relative gains and strong chemical plausibility even when exact structures are not recovered. Combined with simulated data pre-training and adaptive use of experimental spectra, the method offers a robust, scalable workflow to accelerate structure elucidation in metabolomics and related fields.

Abstract

Tandem Mass Spectrometry enables the identification of unknown compounds in crucial fields such as metabolomics, natural product discovery and environmental analysis. However, current methods rely on database matching from previously observed molecules, or on multi-step pipelines that require intermediate fragment or fingerprint prediction. This makes finding the correct molecule highly challenging, particularly for compounds absent from reference databases. We introduce a framework that, by leveraging test-time tuning, enhances the learning of a pre-trained transformer model to address this gap, enabling end-to-end de novo molecular structure generation directly from the tandem mass spectra and molecular formulae, bypassing manual annotations and intermediate steps. We surpass the de-facto state-of-the-art approach DiffMS on two popular benchmarks NPLIB1 and MassSpecGym by 100% and 20%, respectively. Test-time tuning on experimental spectra allows the model to dynamically adapt to novel spectra, and the relative performance gain over conventional fine-tuning is of 62% on MassSpecGym. When predictions deviate from the ground truth, the generated molecular candidates remain structurally accurate, providing valuable guidance for human interpretation and more reliable identification.

Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra

TL;DR

The paper addresses end-to-end de novo molecular structure generation from MS/MS spectra, bypassing intermediate fragment prediction and database matching. It introduces test-time tuning of a pre-trained transformer that ingests MS/MS spectra and chemical formula as input and predicts SMILES, guided by an auxiliary fingerprint predictor and formula-constrained generation. The approach achieves state-of-the-art performance on NPLIB1 and MassSpecGym, with large relative gains and strong chemical plausibility even when exact structures are not recovered. Combined with simulated data pre-training and adaptive use of experimental spectra, the method offers a robust, scalable workflow to accelerate structure elucidation in metabolomics and related fields.

Abstract

Tandem Mass Spectrometry enables the identification of unknown compounds in crucial fields such as metabolomics, natural product discovery and environmental analysis. However, current methods rely on database matching from previously observed molecules, or on multi-step pipelines that require intermediate fragment or fingerprint prediction. This makes finding the correct molecule highly challenging, particularly for compounds absent from reference databases. We introduce a framework that, by leveraging test-time tuning, enhances the learning of a pre-trained transformer model to address this gap, enabling end-to-end de novo molecular structure generation directly from the tandem mass spectra and molecular formulae, bypassing manual annotations and intermediate steps. We surpass the de-facto state-of-the-art approach DiffMS on two popular benchmarks NPLIB1 and MassSpecGym by 100% and 20%, respectively. Test-time tuning on experimental spectra allows the model to dynamically adapt to novel spectra, and the relative performance gain over conventional fine-tuning is of 62% on MassSpecGym. When predictions deviate from the ground truth, the generated molecular candidates remain structurally accurate, providing valuable guidance for human interpretation and more reliable identification.

Paper Structure

This paper contains 15 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Proposed framework: a transformer encoder–decoder predicting SMILES from MS/MS spectra and chemical formula. The model is pre-trained on simulated spectra alberts_unraveling_2024 and adapted via test-time tuning on experimental datasets NPLIB1 duhrkop_systematic_2021 and MassSpecGym bushuiev_massspecgym_2024. Test-time tuning selects informative samples from experimental spectra for dynamic adaptation. Comparison with standard fine-tuning results is provided.
  • Figure 2: Schematic illustration of test-time tuning workflow: MS/MS spectrum and chemical formula are the input of the transformer encoder–decoder which predicts SMILES. The encoder generates embeddings used as input to a multilayer perceptron (MLP) trained to predict molecular fingerprints through an additional loss term. The logits produced by the MLP are the projection into a chemical feature space, and this representation is used to identify and select the most relevant training samples from the candidate pool for adaptation. This selection is performed using cosine similarity on the fingerprints logits. The selected samples are then used for gradient updates. This process is repeated until the set of selected data points stops increasing.
  • Figure 3: Comparison of fine-tuning and test-time tuning strategies under different domain conditions. Left: When train and test sets share the same distribution (no domain shift), both approaches achieve similar performance, with fine-tuning typically serving as the upper bound. Only when additional data are used to extend the candidate pool, the performances can be improved using test-time tuning. Right: Under domain shift, where train and test sets differ substantially, fine-tuning can degrade performance, while test-time tuning dynamically selects relevant samples and improves generalization to the target distribution.
  • Figure 4: Top-10 predictions for one of the molecules present in the test set of MassSpecGym bushuiev_massspecgym_2024. Respective Tanimoto similarity and MCES distance from the target molecule are provided below every prediction. The model generates three stereoisomers among the first predictions, indicating structural awareness, but fails to identify the correct SMILES at the first prediction. The correct structure appears as the second candidate (highlighted in light green), which positively contributes to the Top-10 accuracy.
  • Figure 5: Example of available spectra for one specific molecule in the three datasets used in the present work. In this case NPLIB1 duhrkop_systematic_2021 contains only one spectrum for the given molecule, while MassSpecGym bushuiev_massspecgym_2024 contains 5 different spectra obtained using different values of the collision energy. In the last row are presented the spectra obtained with the 5 simulation techniques used in alberts_unraveling_2024 and mentioned above. As it is possible to see they all severely differ from each other.
  • ...and 1 more figures