Table of Contents
Fetching ...

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

Yunhua Zhong, Yixuan Tang, Yifan Li, Jie Yang, Pan Liu, Jun Xia

TL;DR

Insight is provided into factors influencing performance, including the structural diversity of datasets, hyperparameters like learning rate and data sparsity, pretraining effects, metadata ablation settings and cross-domain transfer learning analysis, which provides practical guidance in choosing suitable models.

Abstract

The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of mass-to-charge ratio peaks. However, the lack of experimental spectra hinders the attachment of each molecular identification, and thus urges the establishment of prediction approaches for computational models. Deep learning models appear promising for predicting molecular structure spectra, but overall assessment remains challenging as a result of the heterogeneity in methods and the lack of well-defined benchmarks. To address this, our contribution is the creation of benchmark framework FlexMS for constructing and evaluating diverse model architectures in mass spectrum prediction. With its easy-to-use flexibility, FlexMS supports the dynamic construction of numerous distinct combinations of model architectures, while assessing their performance on preprocessed public datasets using different metrics. In this paper, we provide insights into factors influencing performance, including the structural diversity of datasets, hyperparameters like learning rate and data sparsity, pretraining effects, metadata ablation settings and cross-domain transfer learning analysis. This provides practical guidance in choosing suitable models. Moreover, retrieval benchmarks simulate practical identification scenarios and score potential matches based on predicted spectra.

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

TL;DR

Insight is provided into factors influencing performance, including the structural diversity of datasets, hyperparameters like learning rate and data sparsity, pretraining effects, metadata ablation settings and cross-domain transfer learning analysis, which provides practical guidance in choosing suitable models.

Abstract

The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of mass-to-charge ratio peaks. However, the lack of experimental spectra hinders the attachment of each molecular identification, and thus urges the establishment of prediction approaches for computational models. Deep learning models appear promising for predicting molecular structure spectra, but overall assessment remains challenging as a result of the heterogeneity in methods and the lack of well-defined benchmarks. To address this, our contribution is the creation of benchmark framework FlexMS for constructing and evaluating diverse model architectures in mass spectrum prediction. With its easy-to-use flexibility, FlexMS supports the dynamic construction of numerous distinct combinations of model architectures, while assessing their performance on preprocessed public datasets using different metrics. In this paper, we provide insights into factors influencing performance, including the structural diversity of datasets, hyperparameters like learning rate and data sparsity, pretraining effects, metadata ablation settings and cross-domain transfer learning analysis. This provides practical guidance in choosing suitable models. Moreover, retrieval benchmarks simulate practical identification scenarios and score potential matches based on predicted spectra.
Paper Structure (43 sections, 14 equations, 9 figures, 4 tables)

This paper contains 43 sections, 14 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Main components of FlexMS. We have developed a flexible framework, termed FlexMS, to systematically evaluate the performance of mass spectra prediction models. The framework takes molecules and associated metadata as inputs, employs various featurizers and embedders to generate molecular representations, and utilizes different multi-layer perceptron (MLP) architectures for spectrum prediction at specified bin resolutions. We assess the performance of various models, investigate the impact of different hyperparameters, and evaluate outcomes across diverse task scenarios. Comprehensive metrics are employed to quantify performance.
  • Figure 2: Benchmark performance of different embedders and predictors. (a)The performance (cos similarity and Jensen-Shannon divergence) of different predictor-embedder combinations on four datasets. On GNPS and MassBank datasets, we compare the performance of random and scaffold splits. (b)The performance of different embedders on the MassSpecGym dataset when the amount of training data is limited (25%, 50% and 100%), using cosine similarity as the performance metric. (c)Paired performance scatter plot of pretrained and random-initialized MoleBERT embedders on different datasets, resolutions and predictors. The color, shape and size of points represent different datasets, predictors and resolutions. (d)(e)Critical difference diagrams of different embedders and predictors on the MassSpecGym, GNPS and MassBank datasets, obtained by combining results across embedder-predictor pairs, resolutions and datasets, and applying the Wilcoxon-Holm test to detect pairwise significance. (f)Paired performance scatter plots of two learning rates on representative benchmark settings. The color, shape and size of points represent different embedders, predictors and resolutions.
  • Figure 3: Performance of different embedder-predictor combinations with different resolutions on GNPS dataset (random split), MassBank dataset (random split), MassSpecGym dataset and MIST dataset. Symbol "res-1", "res-2", "res-4", "res-5", "res-10" denote resolutions of 1, 2, 4, 5, and 10 Da respectively.
  • Figure 4: (a)(b) Detailed performance comparison of domain transfer. (a)Ionization mode transfer. (b)Instrument type transfer.(c)The critical difference diagram of different transfer case and embedders. Left is ion-transfer case and right is instrument-transfer case.
  • Figure 5: (a)The normalized rank of different predictors and embedders on CASMI16 and CASMI22 contest. Different lines represent different resolutions, and res-1, res-2, res-4, res-5, res-10 denote resolutions of 1, 2, 4, 5, and 10 Da respectively. (b)The KDE plot of CASMI compound's candidate number distribution. We showed on both CASMI16 and CASMI22 contest. (c)The cumulative distribution and KDE plot of the normalized-rank performance on GFv2-MassFormerMLP combinations. (d)The Critical Difference Diagram of different embedders and predictors on CASMI16 and CASMI22 contest. We compared the performance on normalized rank, so lower value means better rank.
  • ...and 4 more figures